reduce 運算符

發行項
07/04/2024

根據值相似性，將一組字串分組在一起。

針對每個這類群組，運算符會 pattern傳回、 count和 representative。最 pattern 能描述字元代表通配符的群組 * 。 count是群組中的值數目，而 representative 是群組中的其中一個原始值。

語法

T reduce | [kind = ReduceKind] by Expr [with [threshold = Threshold] [, = characters Characters]]

深入瞭解語法慣例。

參數

姓名	類型	必要	描述
Expr	`string`	✔️	要減少的值。
閾值	`real`		介於 0 和 1 之間的值，決定符合群組準則所需的最小數據列分數，以觸發縮減作業。預設值為 0.1。我們建議設定小型輸入的臨界值。使用較小的臨界值時，會將更類似的值分組在一起，導致較少但更類似的群組。較大的臨界值需要較少的相似度，導致更多較不相似的群組。請參閱範例。
字元	`string`		分隔字詞的字元清單。預設值是每個非 ascii 數值字元。如需範例，請參閱 Characters 參數的行為。
ReduceKind	`string`		唯一有效的值為 `source`。如果 `source` 指定，運算符會將數據 `Pattern` 行附加至數據表中的現有數據列，而不是由 `Pattern`匯總。

傳回

具有多個數據列的數據表，標題為、 count和的群組和數據representative行pattern。最 pattern 能描述群組，其中 * 字元代表通配符或任意插入字串的佔位元。 count是群組中的值數目，而 representative 是群組中的其中一個原始值。

例如，的結果 reduce by city 可能包括：

模式	計數	代表
三*	5182	聖伯納德
聖人*	2846	聖露西
莫斯科	3726	莫斯科
-上-	2730	一對一
Paris	2716	Paris

範例

小臨界值

執行查詢

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X"

輸出

模式	計數	代表
MachineLearning*	1000	MachineLearningX4

大型閾值

執行查詢

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X"

輸出

模式	計數	代表
MachineLearning*	177	MachineLearningX9
MachineLearning*	102	MachineLearningX0
MachineLearning*	106	MachineLearningX1
MachineLearning*	96	MachineLearningX6
MachineLearning*	110	MachineLearningX4
MachineLearning*	100	MachineLearningX3
MachineLearning*	99	MachineLearningX8
MachineLearning*	104	MachineLearningX7
MachineLearning*	106	MachineLearningX2

Characters 參數的行為

如果未指定 Characters 參數，則每個非 ascii 數位字元都會變成字詞分隔符。

執行查詢

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

輸出

模式	計數	代表
其他	10

不過，如果您指定「Z」是分隔符，則中的每個值str都是 2 個詞彙：和 tostring(x)： foo

執行查詢

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

輸出

模式	計數	代表
foo*	10	fooZ1

套用 `reduce` 至清理輸入

下列範例示範如何在減少之前，先將運算符套用 reduce 至「清理」輸入，其中要減少的數據行 GUID 會被取代

// Start with a few records from the Trace table.
Trace | take 10000
// We will reduce the Text column which includes random GUIDs.
// As random GUIDs interfere with the reduce operation, replace them all
// by the string "GUID".
| extend Text=replace_regex(Text, @"[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}", @"GUID")
// Now perform the reduce. In case there are other "quasi-random" identifiers with embedded '-'
// or '_' characters in them, treat these as non-term-breakers.
| reduce by Text with characters="-_"

autocluster

注意

運算子的實作reduce主要以 Risto Vaarandi 從事件記錄檔採礦模式的數據群集演算法檔為基礎。

共用方式為

reduce 運算符

語法

參數

傳回

範例

小臨界值

大型閾值

Characters 參數的行為

套用 `reduce` 至清理輸入

意見反應

意見反應

其他資源

共用方式為

reduce 運算符

語法

參數

傳回

範例

小臨界值

大型閾值

Characters 參數的行為

套用 reduce 至清理輸入

相關內容

意見反應

意見反應

其他資源

套用 `reduce` 至清理輸入