Edit

Share via


Categorize text with the ai.classify function

The ai.classify function uses Generative AI to categorize input text according to custom labels you choose—all with a single line of code.

AI functions turbocharge data engineering by putting the power of Fabric's built-in large languages models into your hands. To learn more, visit this overview article.

Important

This feature is in preview, for use in the Fabric 1.3 runtime and higher.

  • Review the prerequisites in this overview article, including the library installations that are temporarily required to use AI functions.
  • By default, AI functions are currently powered by the gpt-3.5-turbo (0125) model. To learn more about billing and consumption rates, visit this article.
  • Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.
  • During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with Fabric's built-in AI endpoint.

Tip

We recommend using the ai.classify function with at least two input labels.

Use ai.classify with pandas

The ai.classify function extends the pandas Series class. Call the function on a text column of a pandas DataFrame to assign user-provided labels to each input row.

The function returns a pandas Series that contains classification labels, which can be stored in a new DataFrame column.

Syntax

df["classification"] = df["text"].ai.classify("category1", "category2", "category3")

Parameters

Name Description
labels
Required
One or more strings representing the set of classification labels to be matched to input text values.

Returns

The function returns a pandas Series that contains a classification label for each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Use ai.classify with PySpark

The ai.classify function is also available for Spark DataFrames. The name of an existing input column must be specified as a parameter, along with a list of classification labels.

The function returns a new DataFrame, with labels that match each row of input text stored in an output column.

Syntax

df.ai.classify(labels=["category1", "category2", "category3"], input_col="text", output_col="classification")

Parameters

Name Description
labels
Required
An array of strings that represents the set of classification labels to be matched to text values in the input column.
input_col
Required
A string that contains the name of an existing column with input text values to be classified according to the custom labels.
output_col
Optional
A string that contains the name of a new column to store a classification label for each input text row. If this parameter isn't set, a default name is generated for the output column.
error_col
Optional
A string that contains the name of a new column. The new column stores any OpenAI errors that result from processing each row of input text. If this parameter isn't set, a default name is generated for the error column. If there are no errors for a row of input, the value in this column is null.

Returns

The function returns a Spark DataFrame with a new column that contains classification labels that match each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)