Use ai.extract with pandas

The ai.extract function extracts fields such as names, locations, or custom entities from each input row.

Note

This article covers ai.extract with pandas. For PySpark, see Use ai.extract with PySpark.
For all AI Functions and prerequisites, see AI Functions overview.
Change default configuration for AI Functions with pandas.

Overview

The ai.extract function extends the pandas Series class. To extract custom entity types from each row of input, call the function on a pandas DataFrame text column.

Unlike other AI Functions, ai.extract returns a pandas DataFrame, instead of a Series, with a separate column for each specified entity type that contains extracted values for each input row.

Syntax

df_entities = df["text"].ai.extract("entity1", "entity2", "entity3")

Parameters

Name	Description
`labels` Required	One or more strings that represent the set of entity types to extract from the input text values.
`aifunc.ExtractLabel` Optional	One or more label definitions describing the fields to extract. See ExtractLabel parameters.

ExtractLabel parameters

Name	Description
`label` Required	A string that represents the entity to extract from the input text values.
`description` Optional	A string that adds extra context for the AI model. It can include requirements, context, or instructions for the AI to consider while performing the extraction.
`max_items` Optional	An int that specifies the maximum number of items to extract for this label.
`type` Optional	JSON schema type for the extracted value. Supported types for this class include `string`, `number`, `integer`, `boolean`, `object`, and `array`.
`properties` Optional	Additional JSON Schema properties for the type, such as `items`, `properties`, and `enum`. See Structured Outputs: Supported schemas.
`raw_col` Optional	A string that sets the column name for the raw LLM response. The raw response provides a list of dictionary pairs for every entity label, including "reason" and "extraction_text".

Tip

Use ai.infer_schema to infer a label schema from file contents and pass the returned aifunc.ExtractLabel objects directly to ai.extract. For examples, see Use multimodal input with AI Functions.

Returns

The function returns a pandas DataFrame with a column for each specified entity type. The column or columns contain the entities extracted for each row of input text. If the function identifies more than one match for an entity, it returns only one of those matches. If no match is found, the result is null.

The default return type is a list of strings for each label. If you set a type in aifunc.ExtractLabel, such as type="integer", the output is a list of Python int values. If you set max_items=1, the function returns one value for that label.

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([
        "MJ Lee lives in Tucson, AZ, and works as a software engineer for Contoso.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

Output:

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([
        "Alex Rivera, a 24-year-old midfielder from Barcelona, scored 12 goals last season, with an impressive 5 goals in one game.",
        "Jordan Smith, a 29-year-old striker from Manchester, scored exactly 1 goal in every game, for a total of 34 goals."
    ], columns=["bio"])

df["goals"] = df["bio"].ai.extract(
    aifunc.ExtractLabel(
        label = "goals", 
        description = "total goals only", 
        max_items = 1, 
        type = "integer"
    )
)
display(df)

Output:

Multimodal input

To extract fields from images, PDFs, or text files, set column_type="path" when the input column contains plain string file paths. File paths returned by aifunc.list_file_paths() are detected automatically. For setup, see Use multimodal input with AI Functions.

# This code uses AI. Always review output for mistakes.

extracted = custom_df["file_path"].ai.extract(
    aifunc.ExtractLabel(
        "name",
        description="The full name of the candidate, first letter capitalized.",
        max_items=1,
    ),
    "companies_worked_for",
    aifunc.ExtractLabel(
        "year_of_experience",
        description="The total years of professional work experience the candidate has, excluding internships.",
        type="integer",
        max_items=1,
    ),
    column_type="path",
)
display(extracted)

Use ai.extract with PySpark.
Learn more about AI Functions.
Use multimodal input with AI Functions.
Change default configuration for AI Functions with pandas.
Understand billing for AI Functions.

Feedback

Was this page helpful?

Last updated on 2026-06-15