使用 AI 函数无缝转换和扩充数据（预览版）

2025-05-20

重要

此功能目前为预览版。

借助 Microsoft Fabric，所有业务专业人员（从开发人员到分析师）都可以使用 Copilot 和 Fabric 数据代理等体验通过生成式 AI 从其企业数据中获得更多价值。由于一组新的 AI 函数用于数据工程，Fabric 用户现在可以利用行业领先的大型语言模型（LLM）的强大功能无缝转换和扩充数据。

AI 函数利用 GenAI 的强大功能进行汇总、分类、文本生成等操作，且全部通过一行代码实现：

计算相似性 ai.similarity：将输入文本的含义与单个公共文本值进行比较，或与其他列中的相应文本值进行比较。
使用 ai.classify：根据选择的标签对输入文本值进行分类。
使用 ai.analyze_sentiment 检测情感：识别输入文本表达的情感状态。
使用 ai.extract 提取实体：从输入文本中查找和提取特定类型的信息，例如位置或名称。
使用ai.fix_grammar修复语法：更正输入文本的语法、拼写和标点符号。
汇总文本，其中包含 ai.summarize：获取输入文本的摘要。
使用 ai.translate：将输入文本翻译为其他语言。
回答自定义用户提示ai.generate_response：根据自己的指令生成响应。

无论是使用 pandas 还是 Spark，都能够无缝地将这些函数合并为数据科学和数据工程工作流的一部分。没有详细的配置，没有复杂的基础结构管理，也不需要特定的技术专业知识。

先决条件

若要将 AI 函数与 Fabric 的内置 AI 终结点配合使用，管理员需要启用 Copilot 的租户切换和 Azure OpenAI 支持的其他功能。
根据你的位置，可能需要启用租户设置以实现跨地域处理。在此处了解更多信息。
还需要 F2 或更高版本的 SKU，或 P SKU。如果使用试用 SKU，可以自带 Azure Open AI 资源。

注释

Fabric 1.3 运行时及更高版本支持 AI 函数。
默认情况下，AI 函数使用 gpt-4o-mini （2024-07-18） 模型。若要详细了解计费和消耗率，请访问本文。
大多数 AI 函数经过优化，可用于英语文本。

AI 函数入门

在 Fabric 笔记本中使用 AI 函数需要预先安装在 Fabric 运行时上的某些自定义包。对于最新的功能和 bug 修复，可以运行以下代码来安装和导入最 up-to日期包。之后，可以根据偏好将 AI 函数与 pandas 或 PySpark 配合使用。

此代码单元安装 AI 函数库及其依赖项。

警告

PySpark 配置单元需要几分钟才能完成执行。我们感谢你的耐心。

熊猫
PySpark

# Install fixed version of packages
%pip install -q --force-reinstall openai==1.30 httpx==0.27.0

# Install latest version of SynapseML-core
%pip install -q --force-reinstall https://mmlspark.blob.core.windows.net/pip/1.0.11-spark3.5/synapseml_core-1.0.11.dev1-py2.py3-none-any.whl

# Install SynapseML-Internal .whl with AI functions library from blob storage:
%pip install -q --force-reinstall https://mmlspark.blob.core.windows.net/pip/1.0.11.1-spark3.5/synapseml_internal-1.0.11.1.dev1-py2.py3-none-any.whl

%%configure -f
{
    "name": "synapseml",
    "conf": {
        "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.11-spark3.5,com.microsoft.azure:synapseml-internal_2.12:1.0.11.1-spark3.5",
        "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
        "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
        "spark.yarn.user.classpath.first": "true",
        "spark.sql.parquet.enableVectorizedReader": "false"
    }
}

此代码单元导入 AI 函数库及其依赖项。 pandas 单元格还会导入一个可选的 Python 库，以显示跟踪每个 AI 函数调用状态的进度栏。

熊猫
PySpark

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd
import openai

# Optional import for progress bars
from tqdm.auto import tqdm
tqdm.pandas()

from synapse.ml.spark.aifunc.DataFrameExtensions import AIFunctions
from synapse.ml.services.openai import OpenAIDefaults
defaults = OpenAIDefaults()
defaults.set_deployment_name("gpt-35-turbo-0125")

应用 AI 函数

以下每个函数都允许调用 Fabric 的内置 AI 终结点，以使用单行代码转换和扩充数据。可以使用 AI 函数分析 pandas 数据帧或 Spark 数据帧。

小提示

若要了解如何自定义 AI 函数的配置，请访问本文。

使用 `ai.similarity` 计算相似性

该 ai.similarity 函数调用 AI 来比较输入文本值与单个通用文本值，或将另一列中的成对文本值进行比较。输出相似性分数是相对的，它们可以从 -1 （相反）到 1 （相同）。分数 0 表示值在含义上完全无关。有关用法 ai.similarity的更详细说明，请访问本文。

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([ 
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike") 
    ], columns=["names", "companies"])
    
df["similarity"] = df["names"].ai.similarity(df["companies"])
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike")
    ], ["names", "companies"])

similarity = df.ai.similarity(input_col="names", other_col="companies", output_col="similarity")
display(similarity)

使用 `ai.classify` 对文本进行分类

该 ai.classify 函数调用 AI，根据所选自定义标签对输入文本进行分类。有关使用 ai.classify的详细信息，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

使用 `ai.analyze_sentiment` 检测情感

该 ai.analyze_sentiment 函数调用 AI 来识别输入文本表示的情感状态是正、负、混合还是中性。如果 AI 无法做出此决定，则输出将留空。有关用法 ai.analyze_sentiment的更详细说明，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("The cleaning spray permanently stained my beautiful kitchen counter. Never again!",),
        ("I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",),
        ("I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",),
        ("The umbrella is OK, I guess.",)
    ], ["reviews"])

sentiment = df.ai.analyze_sentiment(input_col="reviews", output_col="sentiment")
display(sentiment)

使用 `ai.extract` 提取实体

该 ai.extract 函数调用 AI 来扫描输入文本并提取所选标签指定的特定类型信息，例如位置或名称。有关用法 ai.extract的更详细说明，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)

使用 `ai.fix_grammar` 修正语法

该 ai.fix_grammar 函数调用 AI 来更正输入文本的拼写、语法和标点符号。有关用法 ai.fix_grammar的更详细说明，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",)
    ], ["text"])

corrections = df.ai.fix_grammar(input_col="text", output_col="corrections")
display(corrections)

使用 `ai.summarize` 汇总文本

该 ai.summarize 函数调用 AI 来生成输入文本的摘要（数据帧的单个列中的值或所有列中的行值）。有关用法 ai.summarize的更详细说明，请访问此专用文章。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summary")
display(summaries)

使用 `ai.translate` 翻译文本

该 ai.translate 函数调用 AI 将输入文本翻译为所选的新语言。有关用法 ai.translate的更详细说明，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)

使用 `ai.generate_response` 回答自定义用户提示

该 ai.generate_response 函数调用 AI 以根据自己的说明生成自定义文本。有关用法 ai.generate_response的更详细说明，请访问本文。

示例用法

熊猫
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("Scarves",),
        ("Snow pants",),
        ("Ski goggles",)
    ], ["product"])

responses = df.ai.generate_response(prompt="Write a short, punchy email subject line for a winter sale.", output_col="response")
display(responses)

使用 ai.similarity计算相似性。
使用 ai.analyze_sentiment检测情绪。
使用 ai.classify对文本进行分类。
使用 ai_extract 提取实体。
使用 ai.fix_grammar 修复语法。
用 ai.summarize汇总文本。
使用 ai.translate翻译文本。
使用 ai.generate_response回答自定义用户提示。
了解如何自定义 AI 函数的配置。
我们错过了所需的功能吗？在面料创意论坛上提出建议。

通过

使用 AI 函数无缝转换和扩充数据（预览版）

先决条件

AI 函数入门

应用 AI 函数

使用 ai.similarity 计算相似性

示例用法

使用 ai.classify 对文本进行分类

示例用法

使用 ai.analyze_sentiment 检测情感

示例用法

使用 ai.extract 提取实体

示例用法

使用 ai.fix_grammar 修正语法

示例用法

使用 ai.summarize 汇总文本

示例用法

使用 ai.translate 翻译文本

示例用法

使用 ai.generate_response 回答自定义用户提示

示例用法

相关内容

反馈

其他资源

使用 `ai.similarity` 计算相似性

使用 `ai.classify` 对文本进行分类

使用 `ai.analyze_sentiment` 检测情感

使用 `ai.extract` 提取实体

使用 `ai.fix_grammar` 修正语法

使用 `ai.summarize` 汇总文本

使用 `ai.translate` 翻译文本

使用 `ai.generate_response` 回答自定义用户提示