Use ai.summarize with PySpark

The ai.summarize function uses generative AI to produce summaries of input text, with a single line of code. The function can either summarize values from one column of a DataFrame or values across all the columns.

Note

This article covers using ai.summarize with PySpark. To use ai.summarize with pandas, see this article.
See other AI functions in this overview article.
Learn how to customize the configuration of AI functions.

Overview

The ai.summarize function is also available for Spark DataFrames. If you specify the name of an existing input column as a parameter, the function summarizes each value from that column alone. Otherwise, the function summarizes values across all columns of the DataFrame, row by row.

The function returns a new DataFrame with summaries for each input text row, from a single column or across all the columns, stored in an output column.

df.ai.summarize(input_col="text", output_col="summaries")

df.ai.summarize(output_col="summaries")

Parameters

Name	Description
`input_col` Optional	A string that contains the name of an existing column with input text values to summarize. If you don't set this parameter, the function summarizes values across all columns in the DataFrame, instead of values from a specific column.
`instructions` Optional	A string that contains more context for the AI model, such as specifying output length, tone, or more. More precise instructions will yield better results.
`error_col` Optional	A string that contains the name of a new column to store any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, the value in this column is `null`.
`output_col` Optional	A string that contains the name of a new column to store summaries for each input text row. If you don't set this parameter, a default name generates for the output column.

Returns

The function returns a Spark DataFrame that includes a new column that contains summarized text for each input text row. If the input text is null, the result is null. If no input column is specified, the function summarizes values across all columns in the DataFrame.

Example

Summarize values from a single column
Summarize values across all columns

# This code uses AI. Always review output for mistakes.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summaries")
display(summaries)

This example code cell provides the following output:

# This code uses AI. Always review output for mistakes.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(output_col="summaries")
display(summaries)

This example code cell provides the following output:

Use ai.summarize with pandas.
Detect sentiment with ai.analyze_sentiment.
Categorize text with ai.classify.
Generate vector embeddings with ai.embed.
Extract entities with ai_extract.
Fix grammar with ai.fix_grammar.
Answer custom user prompts with ai.generate_response.
Calculate similarity with ai.similarity.
Translate text with ai.translate.
Learn more about the full set of AI functions.
Customize the configuration of AI functions.
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.

Feedback

Was this page helpful?

Last updated on 2025-11-21