Categorize text with the `ai.classify` function

2025-03-05

The ai.classify function uses Generative AI to categorize input text according to custom labels you choose—all with a single line of code.

AI functions turbocharge data engineering by putting the power of Fabric's built-in large languages models into your hands. To learn more, visit this overview article.

Important

This feature is in preview, for use in the Fabric 1.3 runtime and higher.

Review the prerequisites in this overview article, including the library installations that are temporarily required to use AI functions.
By default, AI functions are currently powered by the gpt-3.5-turbo (0125) model. To learn more about billing and consumption rates, visit this article.
Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.
During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with Fabric's built-in AI endpoint.

Tip

We recommend using the ai.classify function with at least two input labels.

Use `ai.classify` with pandas

The ai.classify function extends the pandas Series class. Call the function on a text column of a pandas DataFrame to assign user-provided labels to each input row.

The function returns a pandas Series that contains classification labels, which can be stored in a new DataFrame column.

Syntax

df["classification"] = df["text"].ai.classify("category1", "category2", "category3")

Parameters

Name	Description
`labels` Required	One or more strings representing the set of classification labels to be matched to input text values.

Returns

The function returns a pandas Series that contains a classification label for each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Use `ai.classify` with PySpark

The ai.classify function is also available for Spark DataFrames. The name of an existing input column must be specified as a parameter, along with a list of classification labels.

The function returns a new DataFrame, with labels that match each row of input text stored in an output column.

Syntax

df.ai.classify(labels=["category1", "category2", "category3"], input_col="text", output_col="classification")

Parameters

Name	Description
`labels` Required	An array of strings that represents the set of classification labels to be matched to text values in the input column.
`input_col` Required	A string that contains the name of an existing column with input text values to be classified according to the custom labels.
`output_col` Optional	A string that contains the name of a new column to store a classification label for each input text row. If this parameter isn't set, a default name is generated for the output column.
`error_col` Optional	A string that contains the name of a new column. The new column stores any OpenAI errors that result from processing each row of input text. If this parameter isn't set, a default name is generated for the error column. If there are no errors for a row of input, the value in this column is `null`.

Returns

The function returns a Spark DataFrame with a new column that contains classification labels that match each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

Calculate similarity with ai.similarity.
Detect sentiment with ai.analyze_sentiment.
Extract entities with ai_extract.
Fix grammar with ai.fix_grammar.
Summarize text with ai.summarize.
Translate text with ai.translate.
Answer custom user prompts with ai.generate_response.
Learn more about the full set of AI functions here.
Learn how to customize the configuration of AI functions here.
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.

Share via

Categorize text with the ai.classify function

Use ai.classify with pandas

Syntax

Parameters

Returns

Example

Use ai.classify with PySpark

Syntax

Parameters

Returns

Example

Related content

Feedback

Additional resources

Categorize text with the `ai.classify` function

Use `ai.classify` with pandas

Use `ai.classify` with PySpark