Edit

AI Functions: Transform data at scale with AI

AI Functions in Microsoft Fabric apply one-line, LLM-powered transformations to large pandas or PySpark DataFrames. They run with high concurrency by default, so you can enrich, classify, summarize, and extract data quickly at scale.

Use this table to jump to examples in this overview or detailed pandas and PySpark documentation.

Function Description Detailed documentation
ai.analyze_sentiment Detect the emotional state of input text. Example. pandas, PySpark
ai.classify Categorize input text according to your labels. Example. pandas, PySpark
ai.embed Generate vector embeddings for input text. Example. pandas, PySpark
ai.extract Extract fields such as locations, names, or custom entities. Example. pandas, PySpark
ai.fix_grammar Correct spelling, grammar, and punctuation. Example. pandas, PySpark
ai.generate_response Generate responses based on your instructions. Example. pandas, PySpark
ai.similarity Compare text meaning with one value or another column. Example. pandas, PySpark
ai.summarize Summarize text, files, or row data. Example. pandas, PySpark
ai.translate Translate input text into another language. Example. pandas, PySpark

You can use AI Functions in notebooks with pandas or PySpark, in SQL queries, and in Dataflow Gen2. Fabric handles the endpoint setup for the built-in model.

Use AI Functions across Fabric experiences

AI Functions are available in multiple Fabric experiences:

Use multimodal AI Functions

Multimodal AI Functions process images, PDFs, and text files in addition to text values. Use them to summarize PDFs, classify images, extract document fields, or generate responses grounded in file content.

Supported file types include JPG/JPEG, PNG, static GIF, WebP, PDF, MD, TXT, CSV, TSV, JSON, XML, PY, and other text files. Set column_type="path" in pandas, or input_col_type or col_types in PySpark. For examples, see Use multimodal input with AI Functions.

Prerequisites

Note

  • AI Functions are supported in Fabric Runtime 1.3 and later.
  • Python AI Functions for pandas and PySpark now default to gpt-5-mini with reasoning_effort set to low. This model has a 400,000-token context window and a 128,000-token maximum output. For model limits and rates, see the language models table.
  • AI Functions in Dataflow Gen2 and warehouse will receive the same model upgrade by the end of June 2026.
  • Although the underlying model can handle several languages, most AI Functions are optimized for English-language text.
  • AI Functions don't log or store user prompts, input data, or outputs.

Models and providers

AI Functions use the built-in Fabric endpoint by default. You can also configure pandas and PySpark AI Functions to use any LLM that supports the chat_completions or responses API, including:

  • Azure OpenAI models.
  • Microsoft Foundry models such as Qwen, Kimi, Grok, LLaMA, Mistral, and more.

For configuration options, see Customize AI Functions with pandas and Customize AI Functions with PySpark.

Set up AI Functions

AI Functions support pandas in Python and PySpark runtimes, and PySpark in the PySpark runtime. Install only the packages your runtime needs.

Performance and concurrency

AI Functions process up to 200 rows concurrently by default. Tune concurrency for your workload in pandas or PySpark.

Install dependencies

Runtime Dependencies
pandas (Python runtime) Install the synapseml_internal and synapseml_core wheel files. Install openai version 1.99.5 or later only if you need SDK-native client behavior or Pydantic response-format examples.
pandas (PySpark runtime) No installation is required for most usage. Install openai version 1.99.5 or later only if you need SDK-native client behavior or Pydantic response-format examples.
PySpark (PySpark runtime) No installation is required.
# Optional: install openai version 1.99.5 or later for SDK-native client behavior.
%pip install -q openai 2>/dev/null

Import required libraries

Import the AI Functions library for your runtime.

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd

Use helper functions for files and schemas

AI Functions include helpers for multimodal workflows:

  • aifunc.load: Ingest files from a folder into a structured table. You can provide a prompt or schema.
  • aifunc.list_file_paths: Enumerate file URLs and paths from a folder for use as input to any AI function.
  • ai.infer_schema: Infer an extraction schema from file contents for use with ai.extract.

For examples, see Use multimodal input with AI Functions.

Apply AI Functions

The following examples show the core AI Functions for pandas and PySpark. PySpark AI Functions run as distributed Spark transformations across Fabric Spark clusters.

Note

Most AI Functions support file paths with column_type="path" in pandas or input_col_type/col_types="path" in PySpark. For examples, see Use multimodal input with AI Functions.

Tip

The default Python model is gpt-5-mini with reasoning_effort="low". To change models or tune settings, see pandas configuration or PySpark configuration.

ai.analyze_sentiment: Detect sentiment

The ai.analyze_sentiment function labels each input as positive, negative, mixed, or neutral. You can also provide custom labels.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

Screenshot of a data frame with 'reviews' and 'sentiment' columns. The 'sentiment' column includes 'negative', 'positive', 'mixed', and 'neutral'.

ai.classify: Categorize text

The ai.classify function categorizes input text by using the labels you provide.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Screenshot of a data frame with 'descriptions' and 'category' columns. The 'category' column lists each description’s category name.

ai.embed: Generate vector embeddings

The ai.embed function converts text into numeric vectors that capture semantic meaning. Use embeddings for similarity search, retrieval, and machine learning workflows.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])
    
df["embed"] = df["descriptions"].ai.embed()
display(df)

Screenshot of a data frame with columns 'descriptions' and 'embed'. The 'embed' column contains embed vectors for the descriptions.

ai.extract: Extract entities

The ai.extract function extracts fields such as names, locations, or custom entities from input text.

Structured labels

Use ExtractLabel when you need typed extraction. It supports JSON Schema constructs such as typed fields, enums, arrays, nested objects, nullable values, required properties, and additionalProperties=false. For examples, see pandas or PySpark.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

Screenshot showing a new data frame with the columns 'name', 'profession',  and 'city', containing the data extracted from the original data frame.

ai.fix_grammar: Fix grammar

The ai.fix_grammar function corrects spelling, grammar, and punctuation.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

Screenshot showing a  data frame with a 'text' column and a 'corrections' column, which has the text from the text column with corrected grammar.

ai.generate_response: Apply custom user prompts

The ai.generate_response function creates custom text from your prompt and row data.

Optional parameters

Use response_format when you need structured output, including JSON objects, JSON Schema, or Pydantic models. For examples, see pandas or PySpark.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)

Screenshot showing a data frame with columns 'product' and 'response'. The 'response' column contains a punchy subject line for the product.

ai.similarity: Calculate similarity

The ai.similarity function compares each input value with one reference value or with a value in another column. Scores range from -1 for opposite meaning to 1 for identical meaning.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([ 
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture") 
    ], columns=["names", "industries"])
    
df["similarity"] = df["names"].ai.similarity(df["industries"])
display(df)

Screenshot of a data frame with columns 'names', 'industries', and 'similarity'. The 'similarity' column has similarity scores for the name and industry.

ai.summarize: Summarize text

The ai.summarize function summarizes text, file content, a single column, or all columns in each row.

Customizing summaries with instructions

Use instructions to control tone, length, audience, or focus. For examples, see pandas or PySpark.

# This code uses AI. Always review output for mistakes.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

Screenshot showing a data frame. The 'summaries' column has a summary of the 'description' column only, in the corresponding row.

ai.translate: Translate text

The ai.translate function translates text to another language.

# This code uses AI. Always review output for mistakes. 

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

Screenshot of a data frame with columns 'text' and 'translations'. The 'translations' column contains the text translated to Spanish.

Chain PySpark AI Functions

PySpark AI Functions return DataFrames that keep the df.ai accessor bound to the result schema. Chain transformations without materializing intermediate DataFrames.

# This code uses AI. Always review output for mistakes.

output = (
    df
    .ai.summarize(input_col="review_text", output_col="summary")
    .ai.classify(
        labels=["service", "cleanliness", "location", "other"],
        input_col="summary",
        output_col="category",
    )
)
display(output)

View usage statistics with ai.stats

Use ai.stats on an AI-generated Series or DataFrame to inspect usage and execution metrics.

ai.stats returns a DataFrame with statistics such as:

  • num_successful: Number of rows processed successfully by the AI function.
  • num_exceptions: Number of rows that encountered an exception during execution. These rows are represented as instances of aifunc.ExceptionResult.
  • num_unevaluated: Number of rows that weren't processed because an earlier exception made it impossible to continue evaluation. These rows are represented as instances of aifunc.NotEvaluatedResult.
  • num_harmful: Number of rows blocked by the Azure OpenAI content filter. These rows are represented as instances of aifunc.FilterResult.
  • cached_tokens: Total number of cached input tokens.
  • input_tokens: Total number of input tokens used for the AI function call.
  • output_tokens: Total number of output tokens generated by the model.
  • reasoning_tokens: Total number of reasoning tokens used by reasoning models.
  • model: Model deployment name used for the AI function call.

The output might look like this table:

num_successful num_exceptions num_unevaluated num_harmful cached_tokens input_tokens output_tokens reasoning_tokens client_type input_types model
2 0 0 0 0 555 4 0 fabric_llm_endpoint {"text": 2} gpt-5-mini

Tip

Use ai.stats to track usage, error patterns, and token consumption.

Rows that hit capacity limits are surfaced as instances of aifunc.CapacityExceededResult. In pandas workflows, use aifunc.split_results to separate successful outputs from nonresults, so you can inspect capacity-limited rows and retry them after capacity is available or the limit is addressed.

Cost transparency

pandas AI Functions can show token counts and capacity unit estimates during execution with progress_bar_mode="stats". For PySpark, use df.ai.stats on the result DataFrame.

The Fabric Capacity Metrics app reports model-call consumption as the AI Functions operation. For details, see Billing for AI Functions.

Evaluate and accelerate

Use the AI Functions Starter Notebooks for end-to-end pandas and PySpark examples. Use the AI Functions Eval Notebooks to assess output quality before production.