`ai_query`函式

發行項
2025/03/11

適用於： 核取記號為「是」 Databricks SQL Databricks Runtime

重要

這項功能處於公開預覽狀態。

叫用現有的 Azure Databricks 模型服務端點，並剖析並傳回其回應。

若要在生產工作流程中使用 ai_query，請參閱使用 AI Functions執行批次 LLM 推斷。

需求

注意

在 Databricks Runtime 14.2 和更新版本中，Databricks 筆記本支援此函式，包括在 Databricks 工作流程中作為任務執行的筆記本。
在 Databricks Runtime 14.1 和更低版本中，Databricks 筆記本不支援此函式。

此函數不適用於 Azure Databricks SQL Classic。
您必須啟用 Azure Private Link ，才能在 Pro SQL 倉儲上使用這項功能。
建議使用 Databricks Runtime 15.3 或更新版本。使用 Databricks Runtime 15.2 或更低版本可能會導致效能速度變慢。
您的工作區必須位於支援的模型服務區域中。
現有模型服務端點，並已載入模型。如果您使用 Databricks 裝載的基礎模型，則會為您建立端點。否則，請參閱建立提供端點的自訂模型或建立服務端點的基礎模型。
預設會啟用查詢基礎模型 API。若要查詢自訂模型的端點，或外部模型：
- 在 Databricks 預覽 UI 中啟用AI_Query 以支援自定義模型和外部模型。

目前的DLT 倉儲通道不使用支援 ai_query() 的最新 Databricks 執行時間版本。在資料表屬性中，將 pipelines.channel 設定為 'preview' 以使用 ai_query()。

> create or replace materialized view
    ai_query_mv
    TBLPROPERTIES('pipelines.channel' = 'PREVIEW') AS
  SELECT
    ai_query("databricks-dbrx-instruct", text) as response
  FROM
    messages
  LIMIT 10;

語法

若要查詢提供基礎模型的端點，包括外部模型或自定義基礎模型：

ai_query(endpoint, request)

若要將自訂模型服務端點與模型架構進行查詢：

ai_query(endpoint, request)

若要在沒有模型架構的情況下查詢提供端點的自定義模型：

ai_query(endpoint, request, returnType, failOnError)

參數和返回值

論點	說明	退貨
`endpoint`	`STRING` 常值、提供端點的 Databricks Foundation Model 名稱、提供端點的外部模型，或相同工作區中用於調用的自定義模型端點。定義器必須具有端點 `CAN QUERY` 許可權。
`request`	用於呼叫端點的請求表達式。如果端點是外部模型服務端點或 Databricks Foundation 模型 API 端點，則要求必須是 `STRING`。如果端點是提供端點的自定義模型，要求可以是單一數據行或結構表達式。結構欄位名稱應該符合端點所預期的輸入特徵名稱。
`returnType`	來自端點的預期 `returnType` 的表達式。這類似於 `from_json` 函式中的架構參數，它會接受 `STRING` 表示式或調用 `schema_of_json` 函式。在 Databricks Runtime 14.2 和更高版本中，如果未提供此表達式，`ai_query()` 會從自訂模型服務端點的模型架構自動推斷返回類型。在 Databricks Runtime 14.1 和以下版本中，查詢自定義模型服務端點時需要此運算式。
`failOnError`	（選擇性）預設值為 true 的布爾常值。此旗標指出是否要在回應中包含 `ai_query` 錯誤狀態。	如果 `failOnError => true`為，則函式會傳回與現有行為相同的結果，這是來自端點的已剖析回應。剖析回應的數據類型是從模型類型、模型架構端點或 `returnType` 函式中的 `ai_query` 參數推斷而來。如果 `failOnError => false`，函式會傳回 `STRUCT` 物件，其中包含剖析的回應和錯誤狀態字串。如果資料列的推斷成功，則 `errorStatus` 欄位是 `null`。如果資料列的推斷因模型端點錯誤而失敗，`response` 欄位會變成 `null`。如果數據列的推斷因其他錯誤而失敗，整個查詢就會失敗。如需範例，請參閱使用 `failOnError` 處理錯誤。
`modelParameters`	（選擇性）結構字段，其中包含用於服務基礎模型或外部模型的聊天、完成和內嵌模型參數。這些模型參數必須是常數參數，而不是數據相依。未指定這些模型參數或設定為 `null` 時，將使用預設值。除了預設值為 `temperature`的 `0.0` 之外，這些模型參數的預設值與 Foundation 模型 REST API 參考中所列出的預設值相同，。如需範例，請參閱傳遞模型參數來設定模型。
`responseFormat`	（選擇性）JSON 字串欄位，指定您想要模型遵循的回應格式。支援三種字串型態的回應格式： `text` `json_object` `json_schema`	如果 `failOnError => false` 且您已指定 `responseFormat`，函式會將剖析的回應和錯誤狀態字串當做 `STRUCT` 物件傳回。根據 `responseFormat`中指定的 JSON 字串類型，會傳回下列回應：針對 `responseFormat => '{"type": "text"}'`，回應是字串，例如，`“Here is the response”`。針對 `responseFormat => '{"type": "json_object"}'`，回應是索引鍵/值組 JSON 字串，例如 `{“key”: “value”}`。針對 `responseFormat => '{"type": "json_schema", "json_schema"...}'`，回應是 JSON 字串。請參見範例，了解如何在中強制執行輸出架構。

傳遞模型參數來設定模型

傳遞特定參數，例如最大令牌和溫度，以自定義模型行為。例如：

SELECT text, ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Please summarize the following article: " || text,
    modelParameters => named_struct('max_tokens', 100, 'temperature', 0.7)
) AS summary
FROM uc_catalog.schema.table;

使用 `failOnError` 處理錯誤

使用 ai_query 的 failOnError 自變數來處理錯誤。下列範例示範如何確定如果一個數據列發生錯誤，就不會停止整個查詢執行。請參閱引數與回傳，以瞭解根據此引數設定的預期行為。


SELECT text, ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Summarize the given text comprehensively, covering key points and main ideas concisely while retaining relevant details and examples. Ensure clarity and accuracy without unnecessary repetition or omissions: " || text,
failOnError => false
) AS summary
FROM uc_catalog.schema.table;

使用結構化輸出強制執行輸出架構

請確定輸出符合特定架構，以便更容易進行下游處理。例如，您可以強制執行 JSON 架構回應格式：

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Extract research paper details from the following abstract: " || abstract,
    responseFormat => 'STRUCT<research_paper_extraction:STRUCT<title:STRING, authors:ARRAY<STRING>, abstract:STRING, keywords:ARRAY<STRING>>>'
)
FROM research_papers;

或者，使用 DDL 樣式 JSON 架構：

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Extract research paper details from the following abstract: " || abstract,
    responseFormat => '{
      "type": "json_schema",
      "json_schema": {
        "name": "research_paper_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": {"type": "string"},
            "authors": {"type": "array", "items": {"type": "string"}},
            "abstract": {"type": "string"},
            "keywords": {"type": "array", "items": {"type": "string"}}
          }
        }
      },
      "strict": true
    }'
)
FROM research_papers;

預期的輸出可能如下所示：

{ "title": "Understanding AI Functions in Databricks", "authors": ["Alice Smith", "Bob Jones"], "abstract": "This paper explains how AI functions can be integrated into data workflows.", "keywords": ["Databricks", "AI", "LLM"] }

在使用者定義函式中使用 `ai_query`

您可以將對 ai_query 的呼叫包裝在一個 UDF 中，這樣可以讓您輕鬆地在不同的工作流程中使用和共享這些函式。

CREATE FUNCTION correct_grammar(text STRING)
  RETURNS STRING
  RETURN ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    CONCAT('Correct this to standard English:\n', text));

GRANT EXECUTE ON correct_grammar TO ds;

SELECT
    * EXCEPT text,
    correct_grammar(text) AS text
  FROM articles;

範例：查詢基礎模型

若要查詢服務端點的外部模型：

> SELECT ai_query(
    'my-external-model-openai-chat',
    'Describe Databricks SQL in 30 words.'
  ) AS summary

  "Databricks SQL is a cloud-based platform for data analytics and machine learning, providing a unified workspace for collaborative data exploration, analysis, and visualization using SQL queries."

若要查詢 Databricks Foundation 模型 API 所支持的基礎模型：

> SELECT *,
  ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    "Can you tell me the name of the US state that serves the provided ZIP code? zip code: " || pickup_zip
    )
  FROM samples.nyctaxi.trips
  LIMIT 10

或者，您也可以將 ai_query() 的呼叫包裹在 UDF 中，以便於函式撥打，如下所示：

 CREATE FUNCTION correct_grammar(text STRING)
  RETURNS STRING
  RETURN ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    CONCAT('Correct this to standard English:\n', text));
> GRANT EXECUTE ON correct_grammar TO ds;
- DS fixes grammar issues in a batch.
> SELECT
    * EXCEPT text,
    correct_grammar(text) AS text
  FROM articles;

串連提示和推論欄

串連提示和推斷數據行的方法有很多種，例如使用 ||、CONCAT()或 format_string()：

SELECT
CONCAT('${prompt}', ${input_column_name}) AS concatenated_prompt
FROM ${input_table_name};

或者：

SELECT
'${prompt}' || ${input_column_name} AS concatenated_prompt
FROM ${input_table_name};

或使用 format_string()：

SELECT
format_string('%s%s', '${prompt}', ${input_column_name}) AS concatenated_prompt
FROM ${input_table_name};

傳遞基礎模型的 `failOnError` 和 `modelParameters`

下列範例示範如何針對基礎模型傳遞 failOnError 和 modelParameters 參數，包括 max_tokens 和 temperature。

CREATE OR REPLACE TABLE ${output_table_name} AS (
  SELECT
      ${input_column_name},
      AI_QUERY(
        "${endpoint}",
CONCAT("${prompt}", ${input_column_name}),
failOnError => false,
modelParameters => named_struct('max_tokens', ${num_output_tokens}, 'temperature', ${temperature})
) AS response
FROM ${input_table_name}
LIMIT ${input_num_rows}
);

範例：批次推斷使用案例

以下是使用 failOnError 和 modelParameters，以及 max_tokens 和 temperature 的批次推斷範例。此範例也示範如何使用 CONCAT() 將模型的提示與推斷欄位串聯起來。

有多種方式可執行串連，例如使用 ||、 concat()或 format_string()。


CREATE OR REPLACE TABLE ${output_table_name} AS (
  SELECT
      ${input_column_name},
      AI_QUERY(
        "${endpoint}",
        CONCAT("${prompt}", ${input_column_name}),
        failOnError => false,
        modelParameters => named_struct('max_tokens', ${num_output_tokens},'temperature', ${temperature})
      ) as response
    FROM ${input_table_name}
    LIMIT ${input_num_rows}
)

範例：查詢傳統 ML 模型

若要查詢自訂模型或提供端點的傳統 ML 模型：


> SELECT text, ai_query(
    endpoint => 'spam-classification-endpoint',
    request => named_struct(
      'timestamp', timestamp,
      'sender', from_number,
      'text', text),
    returnType => 'BOOLEAN') AS is_spam
  FROM messages
  LIMIT 10

> SELECT ai_query(
    'weekly-forecast',
    request => struct(*),
    returnType => 'FLOAT') AS predicted_revenue
  FROM retail_revenue

> SELECT ai_query(
    'custom-llama-chat',
    request => named_struct("messages",
        ARRAY(named_struct("role", "user", "content", "What is ML?"))),
    returnType => 'STRUCT<candidates:ARRAY<STRING>>')

  {"candidates":["ML stands for Machine Learning. It's a subfield of Artificial Intelligence that involves the use of algorithms and statistical models to enable machines to learn from data, make decisions, and improve their performance on a specific task over time."]}

共用方式為

`ai_query`函式

需求

語法

參數和返回值

傳遞模型參數來設定模型

使用 `failOnError` 處理錯誤

使用結構化輸出強制執行輸出架構

在使用者定義函式中使用 `ai_query`

範例：查詢基礎模型

串連提示和推論欄

傳遞基礎模型的 `failOnError` 和 `modelParameters`

範例：批次推斷使用案例

範例：查詢傳統 ML 模型

意見反應

其他資源

共用方式為

ai_query函式

需求

語法

參數和返回值

傳遞模型參數來設定模型

使用 failOnError 處理錯誤

使用結構化輸出強制執行輸出架構

在使用者定義函式中使用 ai_query

範例：查詢基礎模型

串連提示和推論欄

傳遞基礎模型的 failOnError 和 modelParameters

範例：批次推斷使用案例

範例：查詢傳統 ML 模型

意見反應

其他資源

`ai_query`函式

使用 `failOnError` 處理錯誤

在使用者定義函式中使用 `ai_query`

傳遞基礎模型的 `failOnError` 和 `modelParameters`