ks_test_fl()
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer
The function ks_test_fl()
is a UDF (user-defined function) that performs the Kolmogorov Smirnov Test.
Prerequisites
- The Python plugin must be enabled on the cluster. This is required for the inline Python used in the function.
- The Python plugin must be enabled on the database. This is required for the inline Python used in the function.
Syntax
T | invoke ks_test_fl(
data1,
data2,
test_statistic,
p_value)
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
data1 | string |
✔️ | The name of the column containing the first set of data to be used for the test. |
data2 | string |
✔️ | The name of the column containing the second set of data to be used for the test. |
test_statistic | string |
✔️ | The name of the column to store test statistic value for the results. |
p_value | string |
✔️ | The name of the column to store p-value for the results. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Define the function using the following let statement. No permissions are required.
Important
A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of ks_test_fl()
, see Example.
let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.ks_2samp(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Example
The following example uses the invoke operator to run the function.
To use a query-defined function, invoke it after the embedded function definition.
let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
let code = ```if 1:
from scipy import stats
data1 = kargs["data1"]
data2 = kargs["data2"]
test_statistic = kargs["test_statistic"]
p_value = kargs["p_value"]
def func(row):
statistics = stats.ks_2samp(row[data1], row[data2])
return statistics[0], statistics[1]
result = df
result[[test_statistic, p_value]] = df.apply(func, axis=1, result_type = "expand")
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke ks_test_fl('sample1', 'sample2', 'test_stat', 'p_val')
Output
id | sample1 | sample2 | test_stat | p_val |
---|---|---|---|---|
Test #1 | [23.64, 20.57, 20.42] | [27.1, 22.12, 33.56] | 0.66666666666666674 | 0.3197243332709643 |
Test #2 | [20.85, 21.89, 23.41] | [35.09, 30.02, 26.52] | 1 | 0.03262165165202116 |
Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 1 | 0.01106563701580386 |