将 ai.summarize 与 PySpark 配合使用

该 ai.summarize 函数使用生成式 AI 生成输入文本摘要，并包含一行代码。该函数可以从数据帧的一列汇总值，也可以汇总所有列中的值。

注释

本文介绍如何将 ai.summarize 与 PySpark 配合使用。若要将 ai.summarize 与 pandas 配合使用，请参阅本文。
请参阅本概述文章中的其他 AI 函数。
了解如何自定义 AI 函数的配置。

概述

ai.summarize 函数也适用于 Spark 数据帧。如果将现有输入列的名称指定为参数，该函数单独汇总该列中的每个值。如果没有这种情况，函数将逐行汇总 DataFrame 所有列的值。

该函数返回一个新的 DataFrame，对每个输入文本行进行总结，存储在输出列中，这些总结可来自单列或所有列。

df.ai.summarize(input_col="text", output_col="summaries")

df.ai.summarize(output_col="summaries")

参数

Name	Description
`input_col` 可选	现有列的名称是一个字符串，其中包含要汇总的输入文本值。如果未设置此参数，该函数会汇总数据帧中所有列的值，而不是特定列中的值。
`instructions` 可选	包含更多 AI 模型的上下文的字符串，例如指定输出长度、音调等。更精确的指令将产生更好的结果。
`error_col` 可选	一个字符串，其中包含新列的名称，用于存储处理每个输入文本行导致的任何 OpenAI 错误。如果未设置此参数，则为错误列生成默认名称。如果输入行没有错误，则此列中的值 `null`。
`output_col` 可选	一个字符串，其中包含用于存储每个输入文本行摘要的新列的名称。如果未设置此参数，则为输出列生成默认名称。

退货

该函数返回一个 Spark 数据帧，其中包含一个新列，其中包含每个输入文本行的汇总文本。如果输入的文本是 null，则结果为 null。如果未指定任何输入列，该函数将汇总 DataFrame 中所有列的值。

Example

汇总单个列中的值
汇总所有列的值

# This code uses AI. Always review output for mistakes.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summaries")
display(summaries)

此示例代码单元提供以下输出：

# This code uses AI. Always review output for mistakes.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(output_col="summaries")
display(summaries)

此示例代码单元提供以下输出：

将 ai.summarize 与 pandas 配合使用。
使用 ai.analyze_sentiment检测情绪。
使用 ai.classify 对文本进行分类。
使用 ai.embed 生成矢量嵌入。
使用 ai_extract提取实体。
使用 ai.fix_grammar修复语法。
使用 ai.generate_response回答自定义用户提示。
使用 ai.similarity 计算相似性。
使用 ai.translate 翻译文本。
详细了解完整的 AI 函数集。
自定义 AI 函数的配置。
我们错过了所需的功能吗？在面料创意论坛上提出建议。

反馈

此页面是否有帮助？

Last updated on 2025-11-21

通过

将 ai.summarize 与 PySpark 配合使用

概述

Syntax

参数

退货

Example

相关内容

反馈

其他资源