將 ai.extract 與 pandas 一起使用

該 ai.extract 函數使用生成式 AI 掃描輸入文字並擷取您選擇的標籤所指定的特定類型的資訊（例如，位置或名稱）。它只使用一行代碼。

備註

本文介紹如何將 ai.extract 與 pandas 結合使用。若要將 ai.extract 與 PySpark 搭配使用，請參閱這篇文章。
請參閱此概述文章中的其他 AI 功能。
瞭解如何自訂 AI 功能的設定。

概觀

ai.extract 函式會擴充 pandas Series 類別。若要從每一列輸入擷取自訂實體類型，請在 pandas DataFrame 文字欄上呼叫函數。

不同於其他 AI 函式，ai.extract 會傳回 pandas DataFrame，而不是 Series，每個指定的實體類型都有個別的數據行，其中包含每個輸入數據列的擷取值。

語法

df_entities = df["text"].ai.extract("entity1", "entity2", "entity3")

參數

名稱	Description
`labels` 為必填項目	一或多個字串，代表要從輸入文字值擷取的實體類型集。
`aifunc.ExtractLabel` 可選	一個或多個標籤定義，描述要擷取的欄位。欲了解更多資訊，請參閱 ExtractLabel 參數表。

ExtractLabel 參數

名稱	Description
`label` 為必填項目	一個字串，代表要從輸入文字值中提取的實體。
`description` 可選	一個字串，為 AI 模型增添額外上下文。它可以包含需求、背景資訊或指示，以便 AI 在執行提取時考慮這些因素。
`max_items` 可選	一個整數，指定此標籤最大要擷取的項目數量。
`type` 可選	提取值的 JSON 結構型態。此類別支援的類型包括 `string`、 `number`、 `integer`、 `booleanobjectarray`和。
`properties` 可選	更多關於該型態作為字典的 JSON schema 屬性。它可以包含支援的屬性，例如「項目」適用於陣列、「屬性」適用於物件、「枚舉類型」的屬性等等。請參見本文中的範例用法。
`raw_col` 可選	一個字串，用來設定原始 LLM 回應的欄位名稱。原始回應會為每個實體標籤（包括「reason」和「extraction_text」）提供字典對的清單。

退貨

函式會傳回 pandas DataFrame，其中包含每個指定實體類型的數據行。該欄或多個欄包含針對每個輸入文本行所擷取的實體。如果函式識別實體的多個相符項目，則只會傳回其中一個相符項目。如果找不到相符項目，結果會 null。

預設的回傳類型是每個標籤的字串列表。若使用者選擇在配置中 aifunc.ExtractLabel 指定不同型別，例如「type=整數」，輸出將是 python 整數的清單。若使用者在設定中 aifunc.ExtractLabel 指定「max_items=1」，則該標籤只會回傳該類型中的一個元素。

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([
        "MJ Lee lives in Tuscon, AZ, and works as a software engineer for Contoso.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

此範例程式碼儲存格提供下列輸出：

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([
        "Alex Rivera, a 24-year-old midfielder from Barcelona, scored 12 goals last season, with an impressive 5 goals in one game.",
        "Jordan Smith, a 29-year-old striker from Manchester, scored exactly 1 goal in every game, for a total of 34 goals."
    ], columns=["bio"])

df["goals"] = df["bio"].ai.extract(
    aifunc.ExtractLabel(
        label = "goals", 
        description = "total goals only", 
        max_items = 1, 
        type = "integer"
    )
)
display(df)

此範例程式碼儲存格提供下列輸出：

使用 ai.extract 搭配 PySpark。
用 ai.analyze_sentiment檢測情緒。
使用 ai.classify 對文本進行分類。
用 ai.embed 產生向量嵌入。
用 ai.fix_grammar修復語法。
使用 ai.generate_response 回答自訂使用者提示。
使用 ai.similarity 計算相似度。
使用 ai.summarize 總結文本。
使用 ai.translate 翻譯文本。
進一步了解全套 AI 功能。
自訂 AI功能的配置。
我們錯過了您需要的功能嗎？請在 Fabric Ideas 論壇上提出建議。

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-11-21

共用方式為

將 ai.extract 與 pandas 一起使用

概觀

語法

參數

ExtractLabel 參數

退貨

Example

相關內容

意見反應

其他資源