共用方式為


AI_GENERATE_CHUNKS (Transact-SQL) (預覽)

適用於: SQL Server 2025 (17.x) 預覽

備註

AI_GENERATE_CHUNKS 在 SQL Server 2025 中,目前為 預覽狀態

AI_GENERATE_CHUNKS 是數據表值函式,會根據類型、大小和來源運算式建立「區塊」或文字片段。

相容性層級 170

AI_GENERATE_CHUNKS 需要相容性層級至少為170。 當層級小於 170 時,Database Engine 找不到函 AI_GENERATE_CHUNKS 式。

若要變更資料庫的相容性層級,請參閱 檢視或變更資料庫的相容性層級

語法

Transact-SQL 語法慣例

AI_GENERATE_CHUNKS (source = text_expression
                    , chunk_type = FIXED
                   [ [ , ] chunk_size = numeric_expression ]
                   [ [ , ] overlap = numeric_expression ]
                   [ [ , ] enable_chunk_set_id = numeric_expression]
)

論點

任何字元類型的 表達式 (例如 nvarcharvarcharncharchar)。

chunk_type

字串常值,將類型或方法命名為區塊化文字/檔,而且不能 NULL 是或來自數據行的值。

此版本的已接受值:

  • FIXED

chunk_size

當 為 chunk_typeFIXED,此參數會設定指定為變數、常值或 tinyint、smallintintbigint 類型的純量運算式之每個區塊的字元/字數計數大小。 chunk_size 不能是 NULL、負數或零 (0)。

重疊

重疊參數會決定應包含在目前區塊中的上述文字百分比。 這個百分比會套用至 chunk_size 參數,以字元為單位來計算大小。 重疊值可以指定為變數、常值或 tinyint、smallint、int 或 bigint 類型的純量表達式。 它必須是介於零(0) 和 50 之間的整數,且不能是 NULL 或負數。 預設值為零 (0)。

enable_chunk_set_id

int表達式,做為旗標來啟用或停用chunk_set_id輸出數據行;傳回數字的數據行,可協助群組屬於相同來源的傳回區塊。 值為 1 會啟用資料行。 如果省略 enable_chunk_set_id 、NULL 或值為 0,則會 chunk_set_id 停用數據行,而不會傳回。

傳回類型

AI_GENERATE_CHUNKS 會傳回具有下列資料行的數據表:

欄位名稱 數據類型 說明
chunk 與來源表達式數據類型相同 從來源表達式區塊化的傳回文字。
chunk_order int 與每個區塊的順序相關的已排序數位序列,從 開始 1 處理,並依 1遞增。
chunk_offset int 與區塊處理開始相關的源數據/檔區塊位置。
chunk_length int 傳回之文字區塊的字元長度。
chunk_set_id int 選擇性數據行,其中包含將來源表達式、檔或數據列的所有區塊分組的標識碼。 如果在單一交易中區塊化多個檔案或資料列,則每個檔案或資料列都有不同的 chunk_set_id。 可見度是由 enable_chunk_set_id 參數所控制。

傳回範例

以下是使用下列參數傳回結果 AI_GENERATE_CHUNKS 的範例:

  • FIXED區塊類型。

  • 區塊大小為 50 個字元。

  • 已啟用 「chunk_set_id」。

  • 區塊文字: All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.

chunk_order chunk_offset chunk_length chunk_set_id
All day long we seemed to dawdle through a country 1 1 50 1
which was full of beauty of every kind. Sometimes 2 51 50 1
we saw little towns or castles on the top of stee 3 101 50 1
p hills such as we see in old missals; sometimes w 4 151 50 1
e ran by rivers and streams which seemed from the 5 201 50 1
wide stony margin on each side of them to be subje 6 251 50 1
ct to great floods. 7 301 19 1

備註

AI_GENERATE_CHUNKS 可以在具有多個數據列的數據表上使用。 根據區塊大小和要區塊化的文字數量,結果集會指出何時啟動具有數據行的新數據行或檔 chunk_set_id 。 在下列範例中,當 chunk_set_id 它完成第一個數據列的文字區塊化並移至第二個數據列時,就會變更。 和 chunk_order 的值chunk_offset也會重設,以指出新的起點。

CREATE TABLE textchunk (text_id INT IDENTITY(1,1) PRIMARY KEY, text_to_chunk nvarchar(max));
GO

INSERT INTO textchunk (text_to_chunk)
VALUES
('All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.'),
('My Friend, Welcome to the Carpathians. I am anxiously expecting you. Sleep well to-night. At three to-morrow the diligence will start for Bukovina; a place on it is kept for you. At the Borgo Pass my carriage will await you and will bring you to me. I trust that your journey from London has been a happy one, and that you will enjoy your stay in my beautiful land. Your friend, DRACULA')
GO

SELECT c.*
FROM textchunk t
CROSS APPLY
   AI_GENERATE_CHUNKS(source = text_to_chunk, chunk_type = FIXED, chunk_size = 50, enable_chunk_set_id = 1) c
chunk_order chunk_offset chunk_length chunk_set_id
All day long we seemed to dawdle through a country 1 1 50 1
which was full of beauty of every kind. Sometimes 2 51 50 1
we saw little towns or castles on the top of stee 3 101 50 1
p hills such as we see in old missals; sometimes w 4 151 50 1
e ran by rivers and streams which seemed from the 5 201 50 1
wide stony margin on each side of them to be subje 6 251 50 1
ct to great floods. 7 301 19 1
My Friend, Welcome to the Carpathians. I am anxi 1 1 50 2
ously expecting you. Sleep well to-night. At three 2 51 50 2
to-morrow the diligence will start for Bukovina; 3 101 50 2
a place on it is kept for you. At the Borgo Pass m 4 151 50 2
y carriage will await you and will bring you to me 5 201 50 2
. I trust that your journey from London has been a 6 251 50 2
happy one, and that you will enjoy your stay in m 7 301 50 2
y beautiful land. Your friend, DRACULA 8 351 38 2

範例

A。 將固定類型和大小為 100 個字元的文字資料行區塊

下列範例會使用 AI_GENERATE_CHUNKS 將文字數據行區塊化。 它會使用 chunk_typeFIXEDchunk_size 100 個字元。

SELECT
    c.chunk
FROM
   docs_table t
CROSS APPLY
   AI_GENERATE_CHUNKS(source = text_column, chunk_type = FIXED, chunk_size = 100) c

B. 區塊具有重疊的文字數據行

下列範例會使用 AI_GENERATE_CHUNKS 來區塊使用重疊的文字數據行。 它會使用 FIXED 的chunk_type、100 個字元的chunk_size,以及 10% 的重疊。

SELECT
    c.chunk
FROM
   docs_table t
CROSS APPLY
   AI_GENERATE_CHUNKS(source = text_column, chunk_type = FIXED, chunk_size = 100, overlap = 10) c

C. 搭配AI_GENERATE_CHUNKS使用 AI_GENERATE_EMBEDDINGS

這個範例使用 AI_GENERATE_EMBEDDINGS 搭配 AI_GENERATE_CHUNKS 來建立文字區塊的內嵌,然後將從 AI 模型推斷端點傳回的向量陣列插入數據表中。

INSERT INTO
    my_embeddings (chunked_text, vector_embeddings)
SELECT
    c.chunk,
    AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyAzureOpenAiModel)
FROM
    table_with_text t
CROSS APPLY
    AI_GENERATE_CHUNKS(source = t.text_to_chunk, chunk_type = FIXED, chunk_size = 100) c