# percentile(), percentiles() (aggregation function)

Calculates an estimate for the specified nearest-rank percentile of the population defined by `*Expr*`. The accuracy depends on the density of population in the region of the percentile.

Note

This function is used in conjunction with the summarize operator.

• `percentiles()` is like `percentile()`, but calculates a number of percentile values, which is faster than calculating each percentile individually.
• `percentilesw()` is like `percentilew()`, but calculates a number of weighted percentile values, which is faster than calculating each percentile individually.
• `percentilew()` and `percentilesw()` let you calculate weighted percentiles. Weighted percentiles calculate the given percentiles in a "weighted" way, by treating each value as if it was repeated `weight` times, in the input.

To add a percentage calculation to your results, see the percentages example.

## Syntax

`percentile` `(`Expr`,` Percentile`)`

`percentiles` `(`Expr`,` Percentile1 [`,` Percentile2]`)`

`percentiles_array` `(`Expr`,` Percentile1 [`,` Percentile2]`)`

`percentiles_array` `(`Expr`,` Dynamic array`)`

`percentilew` `(`Expr`,` WeightExpr`,` Percentile`)`

`percentilesw` `(`Expr`,` WeightExpr`,` Percentile1 [`,` Percentile2]`)`

`percentilesw_array` `(`Expr`,` WeightExpr`,` Percentile1 [`,` Percentile2]`)`

`percentilesw_array` `(`Expr`,` WeightExpr`,` Dynamic array`)`

## Arguments

Name Type Required Description
Expr string Expression that will be used for aggregation calculation.
WeightExpr string Expression that will be used as the weight of values for aggregation calculation.
Percentile double A constant that specifies the percentile.
Dynamic array dynamic A list of percentiles in a dynamic array of integers or floating point numbers.

## Returns

Returns an estimate for `*Expr*` of the specified percentiles in the group.

## Examples

Example 1

The following example shows the value of `DamageProperty` being larger than 95% of the sample set and smaller than 5% of the sample set.

``````StormEvents | summarize percentile(DamageProperty, 95) by State
``````

Results

The results table shown includes only the first 10 rows.

State percentile_DamageProperty_95
ATLANTIC SOUTH 0
FLORIDA 40000
GEORGIA 143333
MISSISSIPPI 80000
AMERICAN SAMOA 250000
KENTUCKY 35000
OHIO 150000
KANSAS 51392
MICHIGAN 49167
ALABAMA 50000

Example 2

The following example shows the value of `DamageProperty` simultaneously calculated using 5, 50 (median) and 95.

``````StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State
``````

Results

The results table shown includes only the first 10 rows.

State percentile_DamageProperty_5 percentile_DamageProperty_50 percentile_DamageProperty_95
ATLANTIC SOUTH 0 0 0
FLORIDA 0 0 40000
GEORGIA 0 0 143333
MISSISSIPPI 0 0 80000
AMERICAN SAMOA 0 0 250000
KENTUCKY 0 0 35000
OHIO 0 2000 150000
KANSAS 0 0 51392
MICHIGAN 0 0 49167
ALABAMA 0 0 50000
... ...

## Weighted percentiles

Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).

Use `summarize percentilesw(Duration, BucketSize, ...)` to calculate the given percentiles in a "weighted" way. Treat each value of Duration as if it was repeated BucketSize times in the input, without actually needing to materialize those records.

## Example

The following example shows weighted percentiles. Using the following set of latency values in milliseconds: `{ 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }`.

To reduce bandwidth and storage, do pre-aggregation to the following buckets: `{ 10, 20, 30, 40, 50, 100 }`. Count the number of events in each bucket to produce the following table:

``````let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
``````

The table displays:

• Eight events in the 10-ms bucket (corresponding to subset `{ 1, 1, 2, 2, 2, 5, 7, 7 }`)
• Six events in the 20-ms bucket (corresponding to subset `{ 12, 12, 15, 15, 15, 18 }`)
• Three events in the 30-ms bucket (corresponding to subset `{ 21, 22, 26 }`)
• One event in the 40-ms bucket (corresponding to subset `{ 35 }`)

At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the `percentilesw()` function. For the 50, 75, and 99.9 percentiles, use the following query:

``````let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)
``````

Results

percentile_LatencyBucket_50 percentile_LatencyBucket_75 percentile_LatencyBucket_99_9
20 20 40

## Getting multiple percentiles in an array

Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns.

``````TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName
``````

Results

The results table displays only the first 10 rows.

SensorName percentiles_Value avg_Value
sensor-82 ["0.048141473520867069","0.24407515500271132","0.48974511106780577","0.74160998970950343","0.94587903204190071"] 0.493950914
sensor-130 ["0.049200214398937764","0.25735850440187535","0.51206374010048239","0.74182335059053839","0.95210342463616771"] 0.505111463
sensor-56 ["0.04857779335488676","0.24709868149337144","0.49668762923789589","0.74458470404241883","0.94889104840865857"] 0.497955018
sensor-24 ["0.051507199150534679","0.24803904945640423","0.50397070213183581","0.75653888126010793","0.9518782718727431"] 0.501084379
sensor-47 ["0.045991246974755672","0.24644331118208851","0.48089197707088743","0.74475142784472248","0.9518322864959039"] 0.49386228
sensor-135 ["0.05132897529660399","0.24204987641954018","0.48470113942206461","0.74275730068433621","0.94784079559229406"] 0.494817619
sensor-74 ["0.048914714739047828","0.25160926036445724","0.49832498850160978","0.75257887767110776","0.94932261924236094"] 0.501627252
sensor-173 ["0.048333149363009836","0.26084250046756496","0.51288012531934613","0.74964772791583412","0.95156058795294"] 0.505401226
sensor-28 ["0.048511161184567046","0.2547387968731824","0.50101318228599656","0.75693845702682039","0.95243122486483989"] 0.502066244
sensor-34 ["0.049980293859462954","0.25094722564949412","0.50914023067384762","0.75571549713447961","0.95176564809278674"] 0.504309494
... ... ...

Similarly, weighted percentiles can be returned as a dynamic array using `percentilesw_array`.

Percentiles for `percentiles_array` and `percentilesw_array` can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn't have to be literal.

``````CallDetailRecords
| summarize percentiles_array(Duration, dynamic([5, 25, 50, 75, 95])), avg(Duration)
``````
``````TransformedSensorsData
| summarize percentiles_array(Value, range(0, 100, 5)), avg(Value) by SensorName
``````

## Nearest-rank percentile

P-th percentile (0 < P <= 100) of a list of ordered values, sorted from least to greatest, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).

Define 0-th percentiles to be the smallest member of the population.

Note

Given the approximating nature of the calculation, the actual returned value may not be a member of the population. Nearest-rank definition means that P=50 does not conform to the interpolative definition of the median. When evaluating the significance of this discrepancy for the specific application, the size of the population and an estimation error should be taken into account.

## Estimation error in percentiles

The percentiles aggregate provides an approximate value using T-Digest.

Note

• The bounds on the estimation error vary with the value of the requested percentile. The best accuracy is at both ends of the [0..100] scale. Percentiles 0 and 100 are the exact minimum and maximum values of the distribution. The accuracy gradually decreases towards the middle of the scale. It's worst at the median and is capped at 1%.
• Error bounds are observed on the rank, not on the value. Suppose percentile(X, 50) returned a value of Xm. The estimate guarantees that at least 49% and at most 51% of the values of X are less or equal to Xm. There is no theoretical limit on the difference between Xm and the actual median value of X.
• The estimation may sometimes result in a precise value but there are no reliable conditions to define when it will be the case.