Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Microsoft Fabric AI Functions enable all business professionals (from developers to analysts) to transform and enrich their enterprise data using generative AI.
AI functions use industry-leading large language models (LLMs) for summarization, classification, text generation, and more. With a single line of code, you can:
ai.analyze_sentiment: Detect the emotional state of input text.ai.classify: Categorize input text according to your labels.ai.embed: Generate vector embeddings for input text.ai.extract: Extract specific types of information from input text (for example, locations or names).ai.fix_grammar: Correct the spelling, grammar, and punctuation of input text.ai.generate_response: Generate responses based on your own instructions.ai.similarity: Compare the meaning of input text with a single text value, or with text in another column.ai.summarize: Get summaries of input text.ai.translate: Translate input text into another language.
You can incorporate these functions as part of data science and data engineering workflows, whether you're working with pandas or Spark. There's no detailed configuration and no complex infrastructure management. You don't need any specific technical expertise.
Prerequisites
- To use AI functions with the built-in AI endpoint in Fabric, your administrator needs to enable the tenant switch for Copilot and other features that are powered by Azure OpenAI.
- Depending on your location, you might need to enable a tenant setting for cross-geo processing. Learn more about available regions for Azure OpenAI Service.
- You need a paid Fabric capacity (F2 or higher, or any P edition).
Note
- AI functions are supported in Fabric Runtime 1.3 and later.
- Unless you configure a different model, AI functions default to gpt-4.1-mini. Learn more about billing and consumption rates.
- Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts."
Models and providers
AI functions now support broader models and providers beyond the default Azure OpenAI models. You can configure AI functions to use:
- Azure OpenAI models
- Azure AI Foundry resources (including models such as Claude and LLaMA)
Model and provider selection is configurable through the AI functions configuration. For details on how to set up and configure different models and providers, see the configuration documentation for pandas and PySpark.
Getting started with AI functions
AI Functions can be used with pandas (Python and PySpark runtimes), and with PySpark (PySpark runtime). The required installation and import steps for each are outlined in the following section, followed by the corresponding commands.
Performance and concurrency
AI functions now execute with increased default concurrency of 200, allowing for faster parallel processing of AI operations. You can tune concurrency settings per workload to optimize performance based on your specific requirements. For more information on configuring concurrency and other performance-related settings, see the configuration documentation for pandas and PySpark.
Install dependencies
- Pandas (Python runtime)
synapseml_internalandsynapseml_corewhl files installation required (commands provided in the following code cell)openaipackage installation required (command provided in the following code cell)
- Pandas (PySpark runtime)
openaipackage installation required (command provided in the following code cell)
- PySpark (PySpark runtime)
- No installation required
# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 2>/dev/null
Import required libraries
The following code cell imports the AI functions library and its dependencies.
Apply AI functions
Each of the following functions allows you to invoke the built-in AI endpoint in Fabric to transform and enrich data with a single line of code. You can use AI functions to analyze pandas DataFrames or Spark DataFrames.
Tip
Learn how to customize the configuration of AI functions.
Advanced configuration: When using gpt-5 family models, you can configure advanced options such as reasoning_effort and verbosity. See the configuration pages for pandas and PySpark for details on how to set these options.
Detect sentiment with ai.analyze_sentiment
The ai.analyze_sentiment function invokes AI to identify whether the emotional state expressed by input text is positive, negative, mixed, or neutral. If AI can't make this determination, the output is left blank. For more detailed instructions about the use of ai.analyze_sentiment with pandas, see this article. For ai.analyze_sentiment with PySpark, see this article.
Optional parameters
The ai.analyze_sentiment function now supports additional optional parameters that allow you to customize the sentiment analysis behavior. These parameters provide more control over how sentiment is detected and reported. For details on available parameters, their descriptions, and default values, see the function-specific documentation for pandas and PySpark.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
"I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
"I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
"The umbrella is OK, I guess."
], columns=["reviews"])
df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)
Categorize text with ai.classify
The ai.classify function invokes AI to categorize input text according to custom labels you choose. For more information about the use of ai.classify with pandas, go to this article. For ai.classify with PySpark, see this article.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
"Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
"Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
], columns=["descriptions"])
df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)
Generate vector embeddings with ai.embed
The ai.embed function invokes AI to generate vector embeddings for input text. Vector embeddings are numerical representations of text that capture semantic meaning, making them useful for similarity search, retrieval workflows, and other machine learning tasks. The dimensionality of the embedding vectors depends on the selected model. For more detailed instructions about the use of ai.embed with pandas, see this article. For ai.embed with PySpark, see this article.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
"Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
"Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
], columns=["descriptions"])
df["embed"] = df["descriptions"].ai.embed()
display(df)
Extract entities with ai.extract
The ai.extract function invokes AI to scan input text and extract specific types of information that are designated by labels you choose (for example, locations or names). For more detailed instructions about the use of ai.extract with pandas, see this article. For ai.extract with PySpark, see this article.
Structured labels
The ai.extract function supports structured label definitions through the ExtractLabel schema. You can provide labels with structured definitions that include not just the label name but also type information and attributes. This structured approach improves extraction consistency and allows the function to return correspondingly structured output columns. For example, you can specify labels with additional metadata to guide the extraction process more precisely. See the detailed documentation for pandas and PySpark for examples of using structured labels.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",
"Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
], columns=["descriptions"])
df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)
Fix grammar with ai.fix_grammar
The ai.fix_grammar function invokes AI to correct the spelling, grammar, and punctuation of input text. For more detailed instructions about the use of ai.fix_grammar with pandas, see this article. For ai.fix_grammar with PySpark, see this article.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"There are an error here.",
"She and me go weigh back. We used to hang out every weeks.",
"The big picture are right, but you're details is all wrong."
], columns=["text"])
df["corrections"] = df["text"].ai.fix_grammar()
display(df)
Answer custom user prompts with ai.generate_response
The ai.generate_response function invokes AI to generate custom text based on your own instructions. For more detailed instructions about the use of ai.generate_response with pandas, see this article. For ai.generate_response with PySpark, see this article.
Optional parameters
The ai.generate_response function now supports a response_format parameter that allows you to request structured JSON output. You can specify response_format='json' to receive responses in JSON format. Additionally, you can provide a JSON schema to enforce a specific output structure, ensuring the generated response conforms to your expected data shape. This is particularly useful when you need predictable, machine-readable output from the AI function. For detailed examples and usage patterns, see the documentation for pandas and PySpark.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
("Scarves"),
("Snow pants"),
("Ski goggles")
], columns=["product"])
df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)
Calculate similarity with ai.similarity
The ai.similarity function compares each input text value either to one common reference text or to the corresponding value in another column (pairwise mode). The output similarity score values are relative, and they can range from -1 (opposites) to 1 (identical). A score of 0 indicates that the values are unrelated in meaning. For more detailed instructions about the use of ai.similarity with pandas, see this article. For ai.similarity with PySpark, see this article.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
("Bill Gates", "Technology"),
("Satya Nadella", "Healthcare"),
("Joan of Arc", "Agriculture")
], columns=["names", "industries"])
df["similarity"] = df["names"].ai.similarity(df["industries"])
display(df)
Summarize text with ai.summarize
The ai.summarize function invokes AI to generate summaries of input text (either values from a single column of a DataFrame, or row values across all the columns). For more detailed instructions about the use of ai.summarize with pandas, see this article. For ai.summarize with PySpark, see this article.
Customizing summaries with instructions
The ai.summarize function now supports an instructions parameter that allows you to steer the tone, length, and focus of the generated summaries. You can provide custom instructions to guide how the summary should be created, such as specifying a particular style, target audience, or level of detail. When instructions are not provided, the function uses default summarization behavior. For examples of using the instructions parameter, see the detailed documentation for pandas and PySpark.
# This code uses AI. Always review output for mistakes.
df= pd.DataFrame([
("Microsoft Teams", "2017",
"""
The ultimate messaging app for your organization—a workspace for real-time
collaboration and communication, meetings, file and app sharing, and even the
occasional emoji! All in one place, all in the open, all accessible to everyone.
"""),
("Microsoft Fabric", "2023",
"""
An enterprise-ready, end-to-end analytics platform that unifies data movement,
data processing, ingestion, transformation, and report building into a seamless,
user-friendly SaaS experience. Transform raw data into actionable insights.
""")
], columns=["product", "release_year", "description"])
df["summaries"] = df["description"].ai.summarize()
display(df)
Translate text with ai.translate
The ai.translate function invokes AI to translate input text to a new language of your choice. For more detailed instructions about the use of ai.translate with pandas, see this article. For ai.translate with PySpark, see this article.
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"Hello! How are you doing today?",
"Tell me what you'd like to know, and I'll do my best to help.",
"The only thing we have to fear is fear itself."
], columns=["text"])
df["translations"] = df["text"].ai.translate("spanish")
display(df)
View usage statistics with ai.stats
Fabric AI functions provide a built-in way to inspect usage and execution statistics for any AI-generated Series or DataFrame. You can access these metrics by calling ai.stats on the result returned by an AI function.
ai.stats returns a DataFrame with the following columns:
- num_successful – Number of rows processed successfully by the AI function.
- num_exceptions – Number of rows that encountered an exception during execution. These rows are represented as instances of
aifunc.ExceptionResult. - num_unevaluated – Number of rows that were not processed because an earlier exception made it impossible to continue evaluation. These rows are instances of aifunc.NotEvaluatedResult.
- num_harmful – Number of rows blocked by the Azure OpenAI content filter. These rows are instances of
aifunc.FilterResult. - prompt_tokens – Total number of input tokens used for the AI function call.
- completion_tokens – Total number of output tokens generated by the model.
Tip
You can call ai.stats on any Series or DataFrame returned by an AI function. This can help you track usage, understand error patterns, and monitor token consumption.
Related content
Detect sentiment with
ai.analyze_sentiment in pandasorai.analyze_sentiment in pyspark.Categorize text with
ai.classify in pandasorai.classify in PySpark.Generate vector embeddings with
ai.embed in pandasorai.embed in PySpark.Extract entities with
ai.extract in pandasorai.extract in PySpark.Fix grammar with
ai.fix_grammar in pandasorai.fix_grammar in PySpark.Answer custom user prompts with
ai.generate_response in pandasorai.generate_response in PySpark.Calculate similarity with
ai.similarity in pandasorai.similarity in PySpark.Summarize text with
ai.summarize in pandasorai.summarize in PySpark.Translate text with
ai.translate in pandasorai.translate in PySpark.Customize the configuration of AI functions in pandas or the configuration of AI functions in PySpark .
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.