Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The ai.summarize function summarizes text from one column or across all columns in each row.
Note
- This article covers
ai.summarizewith PySpark. For pandas, see Use ai.summarize with pandas. - For all AI Functions and prerequisites, see AI Functions overview.
- Change default configuration for AI Functions with PySpark.
Overview
The ai.summarize function is also available for Spark DataFrames. If you specify the name of an existing input column as a parameter, the function summarizes each value from that column alone. Otherwise, the function summarizes values across all columns of the DataFrame, row by row.
The function returns a new DataFrame with summaries for each input text row, from a single column or across all the columns, stored in an output column.
Syntax
df.ai.summarize(input_col="text", output_col="summaries")
Parameters
| Name | Description |
|---|---|
input_col Optional |
A string that contains the name of an existing column with input text values to summarize. If you don't set this parameter, the function summarizes values across all columns in the DataFrame, instead of values from a specific column. |
instructions Optional |
A string that provides more context for the AI model, such as output length, tone, audience, or focus. More precise instructions produce better results. |
error_col Optional |
A string that contains the name of a new column to store any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, the value in this column is null. |
output_col Optional |
A string that contains the name of a new column to store summaries for each input text row. If you don't set this parameter, a default name generates for the output column. |
Returns
The function returns a Spark DataFrame that includes a new column that contains summarized text for each input text row. If the input text is null, the result is null. If no input column is specified, the function summarizes values across all columns in the DataFrame.
Example
# This code uses AI. Always review output for mistakes.
df = spark.createDataFrame([
("Microsoft Teams", "2017",
"""
The ultimate messaging app for your organization—a workspace for real-time
collaboration and communication, meetings, file and app sharing, and even the
occasional emoji! All in one place, all in the open, all accessible to everyone.
""",),
("Microsoft Fabric", "2023",
"""
An enterprise-ready, end-to-end analytics platform that unifies data movement,
data processing, ingestion, transformation, and report building into a seamless,
user-friendly SaaS experience. Transform raw data into actionable insights.
""",)
], ["product", "release_year", "description"])
summaries = df.ai.summarize(input_col="description", output_col="summaries")
display(summaries)
Output:
Customize summaries with instructions
Use the instructions parameter to control the tone, length, audience, or focus of generated summaries without changing the source text.
# This code uses AI. Always review output for mistakes.
summaries = df.ai.summarize(
input_col="description",
output_col="executive_summary",
instructions="Write one concise sentence for a business executive. Focus on product value and avoid marketing language.",
)
display(summaries)
Multimodal input
To summarize images, PDFs, or text files, set input_col_type="path" for single-column mode, or col_types for DataFrame-level mode. For setup, see Use multimodal input with AI Functions.
# This code uses AI. Always review output for mistakes.
# Summarize file content from a single column
results = custom_df.ai.summarize(
instructions="Summarize this file in one sentence for a support analyst.",
input_col="file_path",
input_col_type="path",
output_col="summary",
)
display(results)
You can also summarize values across all columns in a DataFrame by omitting the input column and using col_types:
# This code uses AI. Always review output for mistakes.
results = custom_df.ai.summarize(
col_types={"file_path": "path"},
output_col="summary",
)
display(results)
Related content
- Use ai.summarize with pandas.
- Learn more about AI Functions.
- Use multimodal input with AI Functions.
- Change default configuration for AI Functions with PySpark.
- Understand billing for AI Functions.