Share via


Classification

Important

This feature is in Public Preview and is HIPAA compliant.

You can use Classification to classify your documents into predefined categories with AI.

Examples of classification include:

  • Classifying customer call transcripts by intent
  • Classifying documents by content type
  • Classifying product reviews by sentiment

Classification is built on top of the AI function, ai_classify. The Agents page provides a UI interface to quickly classify documents and unstructured text and iterate on classification fields for better results.

Requirements

Create a classification agent

Go to Agents icon. Agents in the left navigation pane of your workspace. Click Create Agent, and select Text Classification.

Step 1. Select your source data

Use Classification to classify your documents.

  1. Select your source data. You can select a volume with documents or a table with text data.

  2. Click Create Agent.

Step 2. Configure classification labels

After Classification processes your data, configure and refine your classification labels.

To add classification labels manually:

  1. Click + Add label to add a classification label.
  2. Enter the label name and optional description, then click Add label.
  3. Repeat 1-2 for each label you want to add.
  4. (Optional) Enable Multiple labels if you want the agent to return more than one label.

To import labels from a table:

  1. Click Import from table.
  2. Select the table and click Next.
  3. Select the label column, description column, and SQL warehouse.
  4. Click Preview to see the labels that are imported.
  5. If you're happy with the labels, click Import and replace labels.

You must have at least two labels defined to run classification. After you've finished adding your labels, click Save and Run Classification.

Classification classifies up to 5 documents and rows and displays the results. You can add more as well.

Classification review UI showing classification labels and results

Step 3. Review and improve classification responses

Review the classification responses and give feedback to improve the agent.

  1. Review the document and classification response:

    • If the response is correct, give it a thumbs up.
    • If the response is incorrect, give it a thumbs down. Help Classification fix the response by selecting the correct label. Click Save.
  2. To improve the agent's responses, adjust the classification label descriptions.

  3. Compare versions to optimize the agent's performance. Click Versions. Next to a previous version, click Compare to compare the classification label descriptions of a previous version with the current version. Click Restore to restore a previous version.

Step 4. Use the classification agent

If you're happy with the responses, start using the agent for classification at scale.

Click Use Agent in the upper-right. You can choose either:

  • Run in SQL to use the agent to classify your entire volume or table. This opens a SQL query that uses ai_classify using the classification labels you defined. For more information on using ai_classify in SQL queries, see ai_classify function.
  • Create a Spark Declarative Pipeline to deploy an ETL pipeline that runs on scheduled intervals to invoke your agent on new data. This creates Lakeflow Spark Declarative Pipelines that updates a streaming table with your classified data. You can configure the pipeline's schedule so that it runs when new data arrives. For more information on Lakeflow Spark Declarative Pipelines, see Lakeflow Spark Declarative Pipelines.

Limitations

See Limitations