Edit

Share via


Transform and enrich data with AI functions

Microsoft Fabric AI Functions enable all business professionals (from developers to analysts) to transform and enrich their enterprise data using generative AI.

AI functions use industry-leading large language models (LLMs) for summarization, classification, text generation, and more. With a single line of code, you can:

  • ai.analyze_sentiment: Detect the emotional state of input text.
  • ai.classify: Categorize input text according to your labels.
  • ai.embed: Generate vector embeddings for input text.
  • ai.extract: Extract specific types of information from input text (for example, locations or names).
  • ai.fix_grammar: Correct the spelling, grammar, and punctuation of input text.
  • ai.generate_response: Generate responses based on your own instructions.
  • ai.similarity: Compare the meaning of input text with a single text value, or with text in another column.
  • ai.summarize: Get summaries of input text.
  • ai.translate: Translate input text into another language.

You can incorporate these functions as part of data science and data engineering workflows, whether you're working with pandas or Spark. There's no detailed configuration and no complex infrastructure management. You don't need any specific technical expertise.

Prerequisites

Note

  • AI functions are supported in Fabric Runtime 1.3 and later.
  • Unless you configure a different model, AI functions default to gpt-4.1-mini. Learn more about billing and consumption rates.
  • Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts."

Models and providers

AI functions now support broader models and providers beyond the default Azure OpenAI models. You can configure AI functions to use:

  • Azure OpenAI models
  • Azure AI Foundry resources (including models such as Claude and LLaMA)

Model and provider selection is configurable through the AI functions configuration. For details on how to set up and configure different models and providers, see the configuration documentation for pandas and PySpark.

Getting started with AI functions

AI Functions can be used with pandas (Python and PySpark runtimes), and with PySpark (PySpark runtime). The required installation and import steps for each are outlined in the following section, followed by the corresponding commands.

Performance and concurrency

AI functions now execute with increased default concurrency of 200, allowing for faster parallel processing of AI operations. You can tune concurrency settings per workload to optimize performance based on your specific requirements. For more information on configuring concurrency and other performance-related settings, see the configuration documentation for pandas and PySpark.

Install dependencies

  • Pandas (Python runtime)
    • synapseml_internal and synapseml_core whl files installation required (commands provided in the following code cell)
    • openai package installation required (command provided in the following code cell)
  • Pandas (PySpark runtime)
    • openai package installation required (command provided in the following code cell)
  • PySpark (PySpark runtime)
    • No installation required
# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 2>/dev/null

Import required libraries

The following code cell imports the AI functions library and its dependencies.

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd

Apply AI functions

Each of the following functions allows you to invoke the built-in AI endpoint in Fabric to transform and enrich data with a single line of code. You can use AI functions to analyze pandas DataFrames or Spark DataFrames.

Tip

Learn how to customize the configuration of AI functions.

Advanced configuration: When using gpt-5 family models, you can configure advanced options such as reasoning_effort and verbosity. See the configuration pages for pandas and PySpark for details on how to set these options.

Detect sentiment with ai.analyze_sentiment

The ai.analyze_sentiment function invokes AI to identify whether the emotional state expressed by input text is positive, negative, mixed, or neutral. If AI can't make this determination, the output is left blank. For more detailed instructions about the use of ai.analyze_sentiment with pandas, see this article. For ai.analyze_sentiment with PySpark, see this article.

Optional parameters

The ai.analyze_sentiment function now supports additional optional parameters that allow you to customize the sentiment analysis behavior. These parameters provide more control over how sentiment is detected and reported. For details on available parameters, their descriptions, and default values, see the function-specific documentation for pandas and PySpark.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

Screenshot of a data frame with 'reviews' and 'sentiment' columns. The 'sentiment' column includes 'negative', 'positive', 'mixed', and 'neutral'.

Categorize text with ai.classify

The ai.classify function invokes AI to categorize input text according to custom labels you choose. For more information about the use of ai.classify with pandas, go to this article. For ai.classify with PySpark, see this article.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Screenshot of a data frame with 'descriptions' and 'category' columns. The 'category' column lists each description’s category name.

Generate vector embeddings with ai.embed

The ai.embed function invokes AI to generate vector embeddings for input text. Vector embeddings are numerical representations of text that capture semantic meaning, making them useful for similarity search, retrieval workflows, and other machine learning tasks. The dimensionality of the embedding vectors depends on the selected model. For more detailed instructions about the use of ai.embed with pandas, see this article. For ai.embed with PySpark, see this article.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])
    
df["embed"] = df["descriptions"].ai.embed()
display(df)

Screenshot of a data frame with columns 'descriptions' and 'embed'. The 'embed' column contains embed vectors for the descriptions.

Extract entities with ai.extract

The ai.extract function invokes AI to scan input text and extract specific types of information that are designated by labels you choose (for example, locations or names). For more detailed instructions about the use of ai.extract with pandas, see this article. For ai.extract with PySpark, see this article.

Structured labels

The ai.extract function supports structured label definitions through the ExtractLabel schema. You can provide labels with structured definitions that include not just the label name but also type information and attributes. This structured approach improves extraction consistency and allows the function to return correspondingly structured output columns. For example, you can specify labels with additional metadata to guide the extraction process more precisely. See the detailed documentation for pandas and PySpark for examples of using structured labels.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

Screenshot showing a new data frame with the columns 'name', 'profession',  and 'city', containing the data extracted from the original data frame.

Fix grammar with ai.fix_grammar

The ai.fix_grammar function invokes AI to correct the spelling, grammar, and punctuation of input text. For more detailed instructions about the use of ai.fix_grammar with pandas, see this article. For ai.fix_grammar with PySpark, see this article.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

Screenshot showing a  data frame with a 'text' column and a 'corrections' column, which has the text from the text column with corrected grammar.

Answer custom user prompts with ai.generate_response

The ai.generate_response function invokes AI to generate custom text based on your own instructions. For more detailed instructions about the use of ai.generate_response with pandas, see this article. For ai.generate_response with PySpark, see this article.

Optional parameters

The ai.generate_response function now supports a response_format parameter that allows you to request structured JSON output. You can specify response_format='json' to receive responses in JSON format. Additionally, you can provide a JSON schema to enforce a specific output structure, ensuring the generated response conforms to your expected data shape. This is particularly useful when you need predictable, machine-readable output from the AI function. For detailed examples and usage patterns, see the documentation for pandas and PySpark.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)

Screenshot showing a data frame with columns 'product' and 'response'. The 'response' column contains a punchy subject line for the product.

Calculate similarity with ai.similarity

The ai.similarity function compares each input text value either to one common reference text or to the corresponding value in another column (pairwise mode). The output similarity score values are relative, and they can range from -1 (opposites) to 1 (identical). A score of 0 indicates that the values are unrelated in meaning. For more detailed instructions about the use of ai.similarity with pandas, see this article. For ai.similarity with PySpark, see this article.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([ 
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture") 
    ], columns=["names", "industries"])
    
df["similarity"] = df["names"].ai.similarity(df["industries"])
display(df)

Screenshot of a data frame with columns 'names', 'industries', and 'similarity'. The 'similarity' column has similarity scores for the name and industry.

Summarize text with ai.summarize

The ai.summarize function invokes AI to generate summaries of input text (either values from a single column of a DataFrame, or row values across all the columns). For more detailed instructions about the use of ai.summarize with pandas, see this article. For ai.summarize with PySpark, see this article.

Customizing summaries with instructions

The ai.summarize function now supports an instructions parameter that allows you to steer the tone, length, and focus of the generated summaries. You can provide custom instructions to guide how the summary should be created, such as specifying a particular style, target audience, or level of detail. When instructions are not provided, the function uses default summarization behavior. For examples of using the instructions parameter, see the detailed documentation for pandas and PySpark.

# This code uses AI. Always review output for mistakes.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

Screenshot showing a data frame. The 'summaries' column has a summary of the 'description' column only, in the corresponding row.

Translate text with ai.translate

The ai.translate function invokes AI to translate input text to a new language of your choice. For more detailed instructions about the use of ai.translate with pandas, see this article. For ai.translate with PySpark, see this article.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

Screenshot of a data frame with columns 'text' and 'translations'. The 'translations' column contains the text translated to Spanish.

View usage statistics with ai.stats

Fabric AI functions provide a built-in way to inspect usage and execution statistics for any AI-generated Series or DataFrame. You can access these metrics by calling ai.stats on the result returned by an AI function.

ai.stats returns a DataFrame with the following columns:

  • num_successful – Number of rows processed successfully by the AI function.
  • num_exceptions – Number of rows that encountered an exception during execution. These rows are represented as instances of aifunc.ExceptionResult.
  • num_unevaluated – Number of rows that were not processed because an earlier exception made it impossible to continue evaluation. These rows are instances of aifunc.NotEvaluatedResult.
  • num_harmful – Number of rows blocked by the Azure OpenAI content filter. These rows are instances of aifunc.FilterResult.
  • prompt_tokens – Total number of input tokens used for the AI function call.
  • completion_tokens – Total number of output tokens generated by the model.

Tip

You can call ai.stats on any Series or DataFrame returned by an AI function. This can help you track usage, understand error patterns, and monitor token consumption.