percentiles_linear_fl()
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer
The function percentiles_linear_fl()
is a user-defined function (UDF) that calculates percentiles using linear interpolation between closest ranks, the same method used by Excel's PERCENTILES.INC function. Kusto native percentile functions use the nearest rank method. For large sets of values the difference between both methods is insignificant, and we recommend using the native function for best performance. For further details on these and additional percentile calculation methods have a look at percentile article on Wikipedia.
The function accepts a table containing the column to calculate on and an optional grouping key, and a dynamic array of the required percentiles, and returns a column containing dynamic array of the percentiles' values per each group.
Syntax
T | invoke percentiles_linear_fl(
val_col,
pct_arr [,
aggr_col ])
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
val_col | string |
✔️ | The name of the column that contains the values with which to calculate the percentiles. |
pct_arr | dynamic |
✔️ | A numerical array containing the required percentiles. Each percentile should be in the range [0-100]. |
aggr_col | string |
The name of the column that contains the grouping key. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Define the function using the following let statement. No permissions are required.
Important
A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of percentiles_linear_fl()
, see Example.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
// Write your query to use the function here.
Example
The following example uses the invoke operator to run the function.
To use a query-defined function, invoke it after the embedded function definition.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals
Output
name | x | pct_arr | pct_val |
---|---|---|---|
A | [5,7,9] | [0,25,50,75,100] | [5,6,7,8,9] |
B | [5,7,7,10] | [0,25,50,75,100] | [5,6.5,7,7.75,10] |