適用於: SQL Server 2025 (17.x) 預覽
備註
AI_GENERATE_CHUNKS
在 SQL Server 2025 中,目前為 預覽狀態。
AI_GENERATE_CHUNKS
是數據表值函式,會根據類型、大小和來源運算式建立「區塊」或文字片段。
相容性層級 170
AI_GENERATE_CHUNKS
需要相容性層級至少為170。 當層級小於 170 時,Database Engine 找不到函 AI_GENERATE_CHUNKS
式。
若要變更資料庫的相容性層級,請參閱 檢視或變更資料庫的相容性層級。
語法
AI_GENERATE_CHUNKS (source = text_expression
, chunk_type = FIXED
[ [ , ] chunk_size = numeric_expression ]
[ [ , ] overlap = numeric_expression ]
[ [ , ] enable_chunk_set_id = numeric_expression]
)
論點
源
任何字元類型的 表達式 (例如 nvarchar、 varchar、 nchar 或 char)。
chunk_type
字串常值,將類型或方法命名為區塊化文字/檔,而且不能 NULL
是或來自數據行的值。
此版本的已接受值:
FIXED
chunk_size
當 為 chunk_type
時FIXED
,此參數會設定指定為變數、常值或 tinyint、smallint、int 或 bigint 類型的純量運算式之每個區塊的字元/字數計數大小。
chunk_size 不能是 NULL
、負數或零 (0
)。
重疊
重疊參數會決定應包含在目前區塊中的上述文字百分比。 這個百分比會套用至 chunk_size
參數,以字元為單位來計算大小。
重疊值可以指定為變數、常值或 tinyint、smallint、int 或 bigint 類型的純量表達式。 它必須是介於零(0
) 和 50 之間的整數,且不能是 NULL 或負數。 預設值為零 (0
)。
enable_chunk_set_id
int 或位表達式,做為旗標來啟用或停用chunk_set_id
輸出數據行;傳回數字的數據行,可協助群組屬於相同來源的傳回區塊。 值為 1 會啟用資料行。 如果省略 enable_chunk_set_id 、NULL 或值為 0,則會 chunk_set_id
停用數據行,而不會傳回。
傳回類型
AI_GENERATE_CHUNKS
會傳回具有下列資料行的數據表:
欄位名稱 | 數據類型 | 說明 |
---|---|---|
chunk |
與來源表達式數據類型相同 | 從來源表達式區塊化的傳回文字。 |
chunk_order |
int | 與每個區塊的順序相關的已排序數位序列,從 開始 1 處理,並依 1 遞增。 |
chunk_offset |
int | 與區塊處理開始相關的源數據/檔區塊位置。 |
chunk_length |
int | 傳回之文字區塊的字元長度。 |
chunk_set_id |
int |
選擇性數據行,其中包含將來源表達式、檔或數據列的所有區塊分組的標識碼。 如果在單一交易中區塊化多個檔案或資料列,則每個檔案或資料列都有不同的 chunk_set_id 。 可見度是由 enable_chunk_set_id 參數所控制。 |
傳回範例
以下是使用下列參數傳回結果 AI_GENERATE_CHUNKS
的範例:
的
FIXED
區塊類型。區塊大小為 50 個字元。
已啟用 「chunk_set_id」。
區塊文字:
All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.
塊 | chunk_order | chunk_offset | chunk_length | chunk_set_id |
---|---|---|---|---|
All day long we seemed to dawdle through a country |
1 | 1 | 50 | 1 |
which was full of beauty of every kind. Sometimes |
2 | 51 | 50 | 1 |
we saw little towns or castles on the top of stee |
3 | 101 | 50 | 1 |
p hills such as we see in old missals; sometimes w |
4 | 151 | 50 | 1 |
e ran by rivers and streams which seemed from the |
5 | 201 | 50 | 1 |
wide stony margin on each side of them to be subje |
6 | 251 | 50 | 1 |
ct to great floods. |
7 | 301 | 19 | 1 |
備註
AI_GENERATE_CHUNKS
可以在具有多個數據列的數據表上使用。 根據區塊大小和要區塊化的文字數量,結果集會指出何時啟動具有數據行的新數據行或檔 chunk_set_id
。 在下列範例中,當 chunk_set_id
它完成第一個數據列的文字區塊化並移至第二個數據列時,就會變更。 和 chunk_order
的值chunk_offset
也會重設,以指出新的起點。
CREATE TABLE textchunk (text_id INT IDENTITY(1,1) PRIMARY KEY, text_to_chunk nvarchar(max));
GO
INSERT INTO textchunk (text_to_chunk)
VALUES
('All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.'),
('My Friend, Welcome to the Carpathians. I am anxiously expecting you. Sleep well to-night. At three to-morrow the diligence will start for Bukovina; a place on it is kept for you. At the Borgo Pass my carriage will await you and will bring you to me. I trust that your journey from London has been a happy one, and that you will enjoy your stay in my beautiful land. Your friend, DRACULA')
GO
SELECT c.*
FROM textchunk t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_to_chunk, chunk_type = FIXED, chunk_size = 50, enable_chunk_set_id = 1) c
塊 | chunk_order | chunk_offset | chunk_length | chunk_set_id |
---|---|---|---|---|
All day long we seemed to dawdle through a country |
1 | 1 | 50 | 1 |
which was full of beauty of every kind. Sometimes |
2 | 51 | 50 | 1 |
we saw little towns or castles on the top of stee |
3 | 101 | 50 | 1 |
p hills such as we see in old missals; sometimes w |
4 | 151 | 50 | 1 |
e ran by rivers and streams which seemed from the |
5 | 201 | 50 | 1 |
wide stony margin on each side of them to be subje |
6 | 251 | 50 | 1 |
ct to great floods. |
7 | 301 | 19 | 1 |
My Friend, Welcome to the Carpathians. I am anxi |
1 | 1 | 50 | 2 |
ously expecting you. Sleep well to-night. At three |
2 | 51 | 50 | 2 |
to-morrow the diligence will start for Bukovina; |
3 | 101 | 50 | 2 |
a place on it is kept for you. At the Borgo Pass m |
4 | 151 | 50 | 2 |
y carriage will await you and will bring you to me |
5 | 201 | 50 | 2 |
. I trust that your journey from London has been a |
6 | 251 | 50 | 2 |
happy one, and that you will enjoy your stay in m |
7 | 301 | 50 | 2 |
y beautiful land. Your friend, DRACULA |
8 | 351 | 38 | 2 |
範例
A。 將固定類型和大小為 100 個字元的文字資料行區塊
下列範例會使用 AI_GENERATE_CHUNKS
將文字數據行區塊化。 它會使用 chunk_type
的 FIXED
和 chunk_size
100 個字元。
SELECT
c.chunk
FROM
docs_table t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_column, chunk_type = FIXED, chunk_size = 100) c
B. 區塊具有重疊的文字數據行
下列範例會使用 AI_GENERATE_CHUNKS
來區塊使用重疊的文字數據行。 它會使用 FIXED 的chunk_type、100 個字元的chunk_size,以及 10% 的重疊。
SELECT
c.chunk
FROM
docs_table t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_column, chunk_type = FIXED, chunk_size = 100, overlap = 10) c
C. 搭配AI_GENERATE_CHUNKS使用 AI_GENERATE_EMBEDDINGS
這個範例使用 AI_GENERATE_EMBEDDINGS
搭配 AI_GENERATE_CHUNKS
來建立文字區塊的內嵌,然後將從 AI 模型推斷端點傳回的向量陣列插入數據表中。
INSERT INTO
my_embeddings (chunked_text, vector_embeddings)
SELECT
c.chunk,
AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyAzureOpenAiModel)
FROM
table_with_text t
CROSS APPLY
AI_GENERATE_CHUNKS(source = t.text_to_chunk, chunk_type = FIXED, chunk_size = 100) c