Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The ai.classify function categorizes each input row by using the labels you provide.
Note
- This article covers
ai.classifywith PySpark. For pandas, see Use ai.classify with pandas. - For all AI Functions and prerequisites, see AI Functions overview.
- Change default configuration for AI Functions with PySpark.
Overview
The ai.classify function is available for Spark DataFrames. You must specify the name of an existing input column as a parameter, along with a list of classification labels.
The function returns a new DataFrame with labels that match each row of input text, stored in an output column.
Syntax
df.ai.classify(labels=["category1", "category2", "category3"], input_col="text", output_col="classification")
Parameters
| Name | Description |
|---|---|
labels Required |
An array of strings that represents the set of classification labels to match to text values in the input column. |
input_col Required |
A string that contains the name of an existing column with input text values to classify according to the custom labels. |
output_col Optional |
A string that contains the name of a new column where you want to store a classification label for each input text row. If you don't set this parameter, a default name is generated for the output column. |
error_col Optional |
A string that contains the name of a new column. The new column stores any OpenAI errors that result from processing each row of input text. If you don't set this parameter, a default name is generated for the error column. If there are no errors for a row of input, the value in this column is null. |
Returns
The function returns a Spark DataFrame that includes a new column that contains classification labels that match each input text row. If a text value can't be classified, the corresponding label is null.
Example
# This code uses AI. Always review output for mistakes.
df = spark.createDataFrame([
("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
], ["descriptions"])
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)
Output:
Multimodal input
To classify images, PDFs, or text files, set input_col_type="path". For setup, see Use multimodal input with AI Functions.
# This code uses AI. Always review output for mistakes.
results = custom_df.ai.classify(
labels=["Master", "PhD", "Bachelor", "Other"],
input_col="file_path",
input_col_type="path",
output_col="highest_degree",
)
display(results)
Related content
- Use ai.classify with pandas.
- Learn more about AI Functions.
- Use multimodal input with AI Functions.
- Change default configuration for AI Functions with PySpark.
- Understand billing for AI Functions.