แชร์ผ่าน


ai_classify function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Important

This functionality is in Public Preview and HIPAA compliant.

During the preview:

The ai_classify() function classifies text content according to custom labels you provide. You can use simple label names for basic classification, or add label descriptions and instructions to improve accuracy for use cases like customer support routing, document categorization, and content analysis.

The function accepts text or VARIANT output from other AI functions like ai_parse_document, enabling composable workflows.

For a UI version to iterate on ai_classify, see Classification.

Requirements

Apache 2.0 license

The underlying models that might be used at this time are licensed under the Apache 2.0 License, Copyright © The Apache Software Foundation. Customers are responsible for ensuring compliance with applicable model licenses.

Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks's internal benchmarks, Databricks might change the model (and the list of applicable licenses provided on this page).

The model powering this function is made available using Model Serving Foundation Model APIs. See Applicable model developer terms for information about which models are available on Databricks and the licenses and policies that govern the use of those models.

If models emerge in the future that perform better according to Databricks's internal benchmarks, Databricks may change the models and update the documentation.

  • This function is only available in some regions, see AI function availability.
  • This function is not available on Azure Databricks SQL Classic.
  • Check the Databricks SQL pricing page.
  • In Databricks Runtime 15.1 and above, this function is supported in Databricks notebooks, including notebooks that are run as a task in a Databricks workflow.
  • Batch inference workloads require Databricks Runtime 15.4 ML LTS for improved performance.

Syntax

ai_classify(
    content VARIANT | STRING,
    labels STRING,
    [options MAP<STRING, STRING>]
) RETURNS VARIANT

Version 1

ai_classify(
    content STRING,
    labels ARRAY<STRING>,
    [options MAP<STRING, STRING>]
) RETURNS STRING

Arguments

  • content: A VARIANT or STRING expression. Accepts either:

  • labels: A STRING literal defining the classification labels. The labels can be:

    • Simple labels: A JSON array of label names.
      ["urgent", "not_urgent"]
      
    • Labels with descriptions: A JSON object mapping label names to descriptions. Label descriptions must be 0-1000 characters.
      {
        "billing_error": "Payment, invoice, or refund issues",
        "product_defect": "Any malfunction, bug, or breakage",
        "account_issue": "Login failures, password resets"
      }
      

    Each label must be 1-100 characters. labels must contain at least 2 labels, and no more than 500 labels.

  • options: An optional MAP<STRING, STRING> containing configuration options:

    • version: Version switch to support migration ("1.0" for v1 behavior, "2.0" for v2 behavior). Default is based on input types, but will fall back to "1.0".
    • instructions: Global description of the task and domain to improve classification quality. Must be less than 20,000 characters.
    • multilabel: Set to "true" to return multiple labels when multiple categories apply. Default is "false" (single-label classification).

Version 1

  • content: A STRING expression containing the text to be classified.

  • labels: An ARRAY<STRING> literal with the expected output classification labels. Must contain at least 2 elements, and no more than 20 elements. Each label must be 1-50 characters.

  • options: An optional MAP<STRING, STRING> containing configuration options:

    • version: Version switch to support migration ("1.0" for v1 behavior, "2.0" for v2 behavior). Default is based on input types, but will fall back to "1.0".

Returns

Returns a VARIANT containing:

{
  "response": ["label_name"], // Array with single label (or multiple if multilabel=true)
  "error_message": null // null on success, or error message on failure
}

The response field contains:

  • Single-label mode (default): An array with one element containing the best matching label
  • Multi-label mode (multilabel: "true"): An array with multiple labels when multiple categories apply
  • Label names exactly match those provided in the labels parameter

Returns NULL if content is NULL or if the content cannot be classified.

Version 1

Returns a STRING. The value matches one of the strings provided in the labels argument.

Returns NULL if content is NULL or if the content cannot be classified.

Examples

Simple labels - label names only

> SELECT ai_classify(
    'My password is leaked.',
    '["urgent", "not_urgent"]'
  );
 {
   "response": ["urgent"],
   "error": null
 }

Labels with descriptions

> SELECT ai_classify(
    'Customer cannot complete checkout due to payment processing error.',
    '{
      "billing_error": "Payment, invoice, or refund issues",
      "product_defect": "Any malfunction, bug, or breakage",
      "account_issue": "Login failures, password resets",
      "feature_request": "Customer suggestions for improvements"
    }'
  );
 {
   "response": ["billing_error"],
   "error": null
 }

Using global instructions

> SELECT ai_classify(
    'User reports app crashes on startup after update.',
    '["critical", "high", "medium", "low"]',
    MAP('instructions', 'Classify bug severity based on user impact and frequency.')
  );
 {
   "response": ["critical"],
   "error": null
 }

Multi-label classification

> SELECT ai_classify(
    'Customer wants refund and reports product arrived broken.',
    '{
      "billing_issue": "Payment or refund requests",
      "product_defect": "Damaged or malfunctioning items",
      "shipping_issue": "Delivery problems"
    }',
    MAP('multilabel', 'true')
  );
 {
   "response": ["billing_issue", "product_defect"],
   "error": null
 }

Composability with ai_parse_document

> WITH parsed_docs AS (
    SELECT
      path,
      ai_parse_document(
        content,
        MAP('version', '2.0')
      ) AS parsed_content
    FROM READ_FILES('/Volumes/support/tickets/', format => 'binaryFile')
  )
  SELECT
    path,
    ai_classify(
      parsed_content,
      '["billing_error", "product_defect", "account_issue", "feature_request"]',
      MAP('instructions', 'Customer support ticket classification.')
    ) AS ticket_category
  FROM parsed_docs;

Batch classification

> SELECT
    description,
    ai_classify(
      description,
      '["clothing", "shoes", "accessories", "furniture", "electronics"]'
    ) AS category
  FROM products
  LIMIT 10;

Version 1

> SELECT ai_classify("My password is leaked.", ARRAY("urgent", "not urgent"));
  urgent

> SELECT
    description,
    ai_classify(description, ARRAY('clothing', 'shoes', 'accessories', 'furniture')) AS category
  FROM
    products
  LIMIT 10;

Limitations

Version 2 limitations:

  • This function is not available on Azure Databricks SQL Classic.

  • This function cannot be used with Views.

  • Label names must be 1–100 characters each.

  • The labels parameter must contain between 2 and 500 unique labels.

  • Label descriptions must be 0–1,000 characters each.

  • The maximum total context size is 128,000 tokens.

Version 1

Version 1 limitations:

  • This function is not available on Azure Databricks SQL Classic.

  • This function cannot be used with Views.

  • Label names must be 1–50 characters each.

  • The labels array must contain between 2 and 20 labels.

  • The content input must be less than 128,000 tokens (about 300,000 characters).