Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The ai.classify function categorizes each input row by using the labels you provide.
Note
- This article covers
ai.classifywith pandas. For PySpark, see Use ai.classify with PySpark. - For all AI Functions and prerequisites, see AI Functions overview.
- Change default configuration for AI Functions with pandas.
Overview
The ai.classify function extends the pandas Series class. To assign user-provided labels to each input row, call the function on a text column of a pandas DataFrame.
The function returns a pandas Series that contains classification labels, which can be stored in a new DataFrame column.
Tip
We recommend using the ai.classify function with at least two input labels.
Syntax
df["classification"] = df["input"].ai.classify("category1", "category2", "category3")
Parameters
| Name | Description |
|---|---|
labels Required |
One or more strings that represent the set of classification labels to match to input text values. |
Returns
The function returns a pandas Series that contains a classification label for each input text row. If a text value can't be classified, the corresponding label is null.
Example
# This code uses AI. Always review output for mistakes.
df = pd.DataFrame([
"This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
"Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
"Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
], columns=["descriptions"])
df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)
Output:
Multimodal input
To classify images, PDFs, or text files, set column_type="path" when the input column contains file path strings. For supported file types and setup, see Use multimodal input with AI Functions.
# This code uses AI. Always review output for mistakes.
file_path_series = aifunc.list_file_paths("/lakehouse/default/Files")
custom_df = pd.DataFrame({"file_path": file_path_series})
custom_df["highest_degree"] = custom_df["file_path"].ai.classify(
"Master", "PhD", "Bachelor", "Other",
)
display(custom_df)
Note
When you use aifunc.list_file_paths() to create your file path column, the returned yarl.URL objects are automatically detected as file paths. You only need to specify column_type="path" when your column contains plain string URLs.
You can also use aifunc.load to ingest files into a DataFrame, then classify the file-path column:
# This code uses AI. Always review output for mistakes.
df, schema = aifunc.load("/lakehouse/default/Files")
df["category"] = df["file_path"].ai.classify("Master", "PhD", "Bachelor", "Other")
display(df)
When you use aifunc.load, the file-path column contains yarl.URL objects that are automatically detected. For plain string URLs, set column_type="path".
Related content
- Use ai.classify with PySpark.
- Learn more about AI Functions.
- Use multimodal input with AI Functions.
- Change default configuration for AI Functions with pandas.
- Understand billing for AI Functions.