log_reduce_train_fl()

Article
08/12/2024

Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer

The function log_reduce_train_fl() finds common patterns in semi structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. The function's algorithm and most of the parameters are identical to log_reduce_fl(), but unlike log_reduce_fl() that outputs a patterns summary table, this function outputs the serialized model. The model can be used by the function log_reduce_predict_fl()/log_reduce_predict_full_fl() to predict the matched pattern for new log lines.

Prerequisites

The Python plugin must be enabled on the cluster. This is required for the inline Python used in the function.

The Python plugin must be enabled on the database. This is required for the inline Python used in the function.

Syntax

T | invoke log_reduce_train_fl(reduce_col, model_name [, use_logram [, use_drain [, custom_regexes [, custom_regexes_policy [, delimiters [, similarity_th [, tree_depth [, trigram_th [, bigram_th ]]]]]]]]])

Learn more about syntax conventions.

Parameters

The following parameters description is a summary. For more information, see More about the algorithm section.

Name	Type	Required	Description
reduce_col	`string`	✔️	The name of the string column the function is applied to.
model_name	`string`	✔️	The name of the output model.
use_logram	`bool`		Enable or disable the Logram algorithm. Default value is `true`.
use_drain	`bool`		Enable or disable the Drain algorithm. Default value is `true`.
custom_regexes	`dynamic`		A dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is `dynamic([])`. The default regex table replaces numbers, IPs and GUIDs.
custom_regexes_policy	`string`		Either 'prepend', 'append' or 'replace'. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is 'prepend'.
delimiters	`dynamic`		A dynamic array containing delimiter strings. Default value is `dynamic([" "])`, defining space as the only single character delimiter.
similarity_th	`real`		Similarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined databases. Default value is 0.5. If Drain is disabled, then this parameter has no effect.
tree_depth	`int`		Increasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect.
trigram_th	`int`		Decreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect.
bigram_th	`int`		Decreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram, then is disabled this parameter has no effect.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined
Stored

Define the function using the following let statement. No permissions are required.

Important

A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of log_reduce_fl(), see Example.

let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
};
// Write your query to use the function here.

Define the stored function once using the following .create function. Database User permissions are required.

Important

You must run this code to create the function before you can use the function as shown in the Example.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a model')
log_reduce_train_fl(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
}

Example

The following example uses the invoke operator to run the function.

Query-defined
Stored

To use a query-defined function, invoke it after the embedded function definition.

//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
};
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")

Important

For this example to run successfully, you must first run the Function definition code to store the function.

//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")

Output

ExtentId	OriginalSize	ExtentSize	CompressedSize	IndexSize	RowCount
3734a525-cc08-44b9-a992-72de97b32414	10383	11546	10834	712	1

Share via

log_reduce_train_fl()

Prerequisites

Syntax

Parameters

Function definition

Example

Feedback

Additional resources