`ai_query` 函数

2025-05-20

适用于：勾选标记为“是” Databricks SQL Databricks Runtime

重要

此功能目前以公共预览版提供。

调用现有的 Azure Databricks 模型服务终结点，然后分析并返回其响应。

若要在生产工作流中使用 ai_query ，请参阅使用 AI Functions 执行批处理 LLM 推理。

要求

此函数在 Azure Databricks SQL Classic 上不可用。
必须启用 Azure 专用链接才能在专业版 SQL 仓库上使用此功能。
建议使用 Databricks Runtime 15.4 LTS 或更高版本。使用 Databricks Runtime 15.3 或更低版本可能会导致性能速度降低。
工作区必须位于受支持的模型服务区域中。
已加载模型的现有模型服务终结点。如果使用 Databricks 托管的基础模型，则会为你创建一个终结点。否则，请参阅 “创建自定义模型服务终结点 ”或 “创建基础模型服务终结点”。
默认情况下，查询基础模型 API 处于启用状态。要查询提供自定义模型或外部模型的终结点：
- 在 Databricks 预览版 UI 中为自定义模型和外部模型启用 AI_Query。

当前的Lakeflow 声明性管道仓库通道不使用支持ai_query()的最新 Databricks Runtime 版本。将表属性中的 pipelines.channel 设置为 'preview'，以使用 ai_query()。

> create or replace materialized view
    ai_query_mv
    TBLPROPERTIES('pipelines.channel' = 'PREVIEW') AS
  SELECT
    ai_query("databricks-meta-llama-3-3-70b-instruct", text) as response
  FROM
    messages
  LIMIT 10;

语法

查询提供基础模型的终结点：

ai_query(endpoint, request)

若要查询具有模型架构的自定义模型服务终结点，请执行以下操作：

ai_query(endpoint, request)

若要查询没有模型架构的自定义模型服务终结点，请执行以下操作：

ai_query(endpoint, request, returnType, failOnError)

参数和返回

论点	DESCRIPTION	返回
`endpoint`	用于调用的同一工作区中的 Databricks Foundation 模型服务终结点、外部模型服务终结点或自定义模型终结点的名称，采用 `STRING` 文本形式。定义者必须对终结点具有 `CAN QUERY` 权限。
`request`	用于调用终结点的请求，采用表达式形式。如果终结点是外部模型服务终结点或 Databricks Foundation 应用程序接口终结点，则请求必须是一个 `STRING`。如果终结点是一个自定义模型服务终结点，则请求可以是单个列或结构表达式。结构字段名称应与终结点所需的输入特征名称匹配。
`returnType`	来自终结点的预期 `returnType`，采用表达式形式。这类似于函数中的`from_json`架构参数，该参数接受`STRING`函数的`schema_of_json`表达式或调用。在 Databricks Runtime 15.2 及更高版本中，如果未提供此表达式， `ai_query()` 则会自动从自定义模型服务终结点的模型架构推断返回类型。在 Databricks Runtime 15.1 及更低版本中，查询提供终结点的自定义模型需要此表达式。
`failOnError`	（可选）默认为 true 的布尔文本。需要 Databricks Runtime 15.3 或更高版本。此标志指示是否在响应中包含 `ai_query` 错误状态。	如果 `failOnError => true`，该函数将返回与现有行为相同的结果，即终结点分析的响应。分析的响应的数据类型是根据模型类型、模型架构的终端点或 `returnType` 函数中的`ai_query` 参数推断的。如果 `failOnError => false`，该函数返回一个 `STRUCT` 对象，该对象包含已分析的响应和错误状态字符串。如果行推理成功，则 `errorStatus` 字段为 `null`。如果行的推理因模型终结点错误而失败，则 `response` 字段为 `null`。如果由于其他错误导致行推理失败，则整个查询将失败。有关示例，请参阅使用 `failOnError` 处理错误。
`modelParameters`	（可选）结构字段，其中包含用于基础模型或外部模型的聊天、补全和嵌入模型参数。这些模型参数必须是常量参数，而不是依赖于数据。需要 Databricks Runtime 15.3 或更高版本。如果未指定这些模型参数或将其设置为 `null`，则会使用默认值。除了`temperature`有默认值`0.0`之外，这些模型参数的默认值与基础模型 REST API 引用中列出的默认值相同。有关传递模型参数来配置模型的示例，请参阅。
`responseFormat`	（可选）指定希望聊天模型遵循的响应格式。需要 Databricks Runtime 15.4 LTS 或更高版本。仅适用于查询聊天基础模型。支持两种响应格式样式。 DDL 样式 JSON 字符串 JSON 字符串。支持三种 JSON 字符串类型的响应格式： `text` `json_object` `json_schema` 有关示例，请参阅通过结构化输出强制输出模式。	如果 `failOnError => false` 且您已指定 `responseFormat`，该函数将返回已解析的响应和错误状态字符串作为 `STRUCT` 对象。根据中指定的 `responseFormat`JSON 字符串类型，将返回以下响应：对于`responseFormat => '{"type": "text"}'`，响应是一个字符串，例如`“Here is the response”`。对于`responseFormat => '{"type": "json_object"}'`，响应是一个键值对 JSON 字符串，例如`{“key”: “value”}`。对于 `responseFormat => '{"type": "json_schema", "json_schema"...}'`，响应是 JSON 字符串。有关示例，请参阅通过结构化输出强制输出模式。

示例：查询基础模型

若要查询外部模型服务终结点，请执行以下操作：

> SELECT ai_query(
    'my-external-model-openai-chat',
    'Describe Databricks SQL in 30 words.'
  ) AS summary

  "Databricks SQL is a cloud-based platform for data analytics and machine learning, providing a unified workspace for collaborative data exploration, analysis, and visualization using SQL queries."

查询 Databricks Foundation 模型 API 支持的基础模型：

> SELECT *,
  ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    "Can you tell me the name of the US state that serves the provided ZIP code? zip code: " || pickup_zip
    )
  FROM samples.nyctaxi.trips
  LIMIT 10

（可选）您还可以在 UDF 中将对 ai_query() 的调用封装起来，以便实现函数调用，如下所示：

 CREATE FUNCTION correct_grammar(text STRING)
  RETURNS STRING
  RETURN ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    CONCAT('Correct this to standard English:\n', text));
> GRANT EXECUTE ON correct_grammar TO ds;
- DS fixes grammar issues in a batch.
> SELECT
    * EXCEPT text,
    correct_grammar(text) AS text
  FROM articles;

示例：查询传统的 ML 模型

若要查询自定义模型或传统的 ML 模型服务终结点，请执行以下作：


> SELECT text, ai_query(
    endpoint => 'spam-classification-endpoint',
    request => named_struct(
      'timestamp', timestamp,
      'sender', from_number,
      'text', text),
    returnType => 'BOOLEAN') AS is_spam
  FROM messages
  LIMIT 10

> SELECT ai_query(
    'weekly-forecast',
    request => struct(*),
    returnType => 'FLOAT') AS predicted_revenue
  FROM retail_revenue

> SELECT ai_query(
    'custom-llama-chat',
    request => named_struct("messages",
        ARRAY(named_struct("role", "user", "content", "What is ML?"))),
    returnType => 'STRUCT<candidates:ARRAY<STRING>>')

  {"candidates":["ML stands for Machine Learning. It's a subfield of Artificial Intelligence that involves the use of algorithms and statistical models to enable machines to learn from data, make decisions, and improve their performance on a specific task over time."]}

高级方案的示例

以下部分提供了高级用例的示例，例如错误处理或如何合并 ai_query 到用户定义的函数中。

连接提示和推理列

可通过多种方式连接提示和推理列，例如使用 ||、 CONCAT()或 format_string()：

SELECT
CONCAT('${prompt}', ${input_column_name}) AS concatenated_prompt
FROM ${input_table_name};

或者：

SELECT
'${prompt}' || ${input_column_name} AS concatenated_prompt
FROM ${input_table_name};

或使用 format_string()：

SELECT
format_string('%s%s', '${prompt}', ${input_column_name}) AS concatenated_prompt
FROM ${input_table_name};

通过传递模型参数配置模型

通过传递特定参数（如最大令牌数和温度）以自定义模型行为。例如：

SELECT text, ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Please summarize the following article: " || text,
    modelParameters => named_struct('max_tokens', 100, 'temperature', 0.7)
) AS summary
FROM uc_catalog.schema.table;

使用 `failOnError` 处理错误

使用 failOnError 的 ai_query 参数来处理错误。以下示例演示如何确保如果一行出错，它不会阻止整个查询运行。请参阅参数和返回值，了解根据此参数设置方式预期的行为。


SELECT text, ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Summarize the given text comprehensively, covering key points and main ideas concisely while retaining relevant details and examples. Ensure clarity and accuracy without unnecessary repetition or omissions: " || text,
failOnError => false
) AS summary
FROM uc_catalog.schema.table;

通过结构化输出强制执行输出架构

确保输出符合特定架构，以便更轻松地使用 responseFormat下游处理。请参阅 Azure Databricks 上的结构化输出。

以下示例强制实施 DDL 样式 JSON 字符串架构：

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Extract research paper details from the following abstract: " || abstract,
    responseFormat => 'STRUCT<research_paper_extraction:STRUCT<title:STRING, authors:ARRAY<STRING>, abstract:STRING, keywords:ARRAY<STRING>>>'
)
FROM research_papers;

或者，使用 JSON 架构响应格式：

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Extract research paper details from the following abstract: " || abstract,
    responseFormat => '{
      "type": "json_schema",
      "json_schema": {
        "name": "research_paper_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": {"type": "string"},
            "authors": {"type": "array", "items": {"type": "string"}},
            "abstract": {"type": "string"},
            "keywords": {"type": "array", "items": {"type": "string"}}
          }
      },
      "strict": true
    }
  }'
)
FROM research_papers;

预期的输出可能如下所示：

{ "title": "Understanding AI Functions in Databricks", "authors": ["Alice Smith", "Bob Jones"], "abstract": "This paper explains how AI functions can be integrated into data workflows.", "keywords": ["Databricks", "AI", "LLM"] }

请在用户定义的函数中使用`ai_query`

可以在 UDF 中包装对函数的调用 ai_query ，以便轻松地在不同的工作流中使用函数并共享它们。

CREATE FUNCTION correct_grammar(text STRING)
  RETURNS STRING
  RETURN ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    CONCAT('Correct this to standard English:\n', text));

GRANT EXECUTE ON correct_grammar TO ds;

SELECT
    * EXCEPT text,
    correct_grammar(text) AS text
  FROM articles;