# percentile(), percentiles() (aggregation function)

Calculates an estimate for the specified nearest-rank percentile of the population defined by `*Expr*`

.
The accuracy depends on the density of population in the region of the percentile.

Note

This function is used in conjunction with the summarize operator.

`percentiles()`

is like`percentile()`

, but calculates a number of percentile values, which is faster than calculating each percentile individually.`percentilesw()`

is like`percentilew()`

, but calculates a number of weighted percentile values, which is faster than calculating each percentile individually.`percentilew()`

and`percentilesw()`

let you calculate weighted percentiles. Weighted percentiles calculate the given percentiles in a "weighted" way, by treating each value as if it was repeated`weight`

times, in the input.

To add a percentage calculation to your results, see the percentages example.

## Syntax

`percentile`

`(`

*Expr*`,`

*Percentile*`)`

`percentiles`

`(`

*Expr*`,`

*Percentile1* [`,`

*Percentile2*]`)`

`percentiles_array`

`(`

*Expr*`,`

*Percentile1* [`,`

*Percentile2*]`)`

`percentiles_array`

`(`

*Expr*`,`

*Dynamic array*`)`

`percentilew`

`(`

*Expr*`,`

*WeightExpr*`,`

*Percentile*`)`

`percentilesw`

`(`

*Expr*`,`

*WeightExpr*`,`

*Percentile1* [`,`

*Percentile2*]`)`

`percentilesw_array`

`(`

*Expr*`,`

*WeightExpr*`,`

*Percentile1* [`,`

*Percentile2*]`)`

`percentilesw_array`

`(`

*Expr*`,`

*WeightExpr*`,`

*Dynamic array*`)`

## Arguments

Name | Type | Required | Description |
---|---|---|---|

Expr |
string | ✓ | Expression that will be used for aggregation calculation. |

WeightExpr |
string | ✓ | Expression that will be used as the weight of values for aggregation calculation. |

Percentile |
double | ✓ | A constant that specifies the percentile. |

Dynamic array |
dynamic | ✓ | A list of percentiles in a dynamic array of integers or floating point numbers. |

## Returns

Returns an estimate for `*Expr*`

of the specified percentiles in the group.

## Examples

**Example 1**

The following example shows the value of `DamageProperty`

being larger than 95% of the sample set and smaller than 5% of the sample set.

```
StormEvents | summarize percentile(DamageProperty, 95) by State
```

**Results**

The results table shown includes only the first 10 rows.

State | percentile_DamageProperty_95 |
---|---|

ATLANTIC SOUTH | 0 |

FLORIDA | 40000 |

GEORGIA | 143333 |

MISSISSIPPI | 80000 |

AMERICAN SAMOA | 250000 |

KENTUCKY | 35000 |

OHIO | 150000 |

KANSAS | 51392 |

MICHIGAN | 49167 |

ALABAMA | 50000 |

**Example 2**

The following example shows the value of `DamageProperty`

simultaneously calculated using 5, 50 (median) and 95.

```
StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State
```

**Results**

The results table shown includes only the first 10 rows.

State | percentile_DamageProperty_5 | percentile_DamageProperty_50 | percentile_DamageProperty_95 |
---|---|---|---|

ATLANTIC SOUTH | 0 | 0 | 0 |

FLORIDA | 0 | 0 | 40000 |

GEORGIA | 0 | 0 | 143333 |

MISSISSIPPI | 0 | 0 | 80000 |

AMERICAN SAMOA | 0 | 0 | 250000 |

KENTUCKY | 0 | 0 | 35000 |

OHIO | 0 | 2000 | 150000 |

KANSAS | 0 | 0 | 51392 |

MICHIGAN | 0 | 0 | 49167 |

ALABAMA | 0 | 0 | 50000 |

... | ... |

## Weighted percentiles

Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).

Use `summarize percentilesw(Duration, BucketSize, ...)`

to calculate the given
percentiles in a "weighted" way. Treat each value of Duration as if it was repeated
BucketSize times in the input, without actually needing to materialize those records.

## Example

The following example shows weighted percentiles.
Using the following set of latency values in milliseconds:
`{ 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }`

.

To reduce bandwidth and storage, do pre-aggregation to the
following buckets: `{ 10, 20, 30, 40, 50, 100 }`

. Count the number of events in each bucket to produce the following table:

```
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
```

The table displays:

- Eight events in the 10-ms bucket (corresponding to subset
`{ 1, 1, 2, 2, 2, 5, 7, 7 }`

) - Six events in the 20-ms bucket (corresponding to subset
`{ 12, 12, 15, 15, 15, 18 }`

) - Three events in the 30-ms bucket (corresponding to subset
`{ 21, 22, 26 }`

) - One event in the 40-ms bucket (corresponding to subset
`{ 35 }`

)

At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the `percentilesw()`

function.
For the 50, 75, and 99.9 percentiles, use the following query:

```
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)
```

**Results**

percentile_LatencyBucket_50 | percentile_LatencyBucket_75 | percentile_LatencyBucket_99_9 |
---|---|---|

20 | 20 | 40 |

## Getting multiple percentiles in an array

Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns.

```
TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName
```

**Results**

The results table displays only the first 10 rows.

SensorName | percentiles_Value | avg_Value |
---|---|---|

sensor-82 | ["0.048141473520867069","0.24407515500271132","0.48974511106780577","0.74160998970950343","0.94587903204190071"] | 0.493950914 |

sensor-130 | ["0.049200214398937764","0.25735850440187535","0.51206374010048239","0.74182335059053839","0.95210342463616771"] | 0.505111463 |

sensor-56 | ["0.04857779335488676","0.24709868149337144","0.49668762923789589","0.74458470404241883","0.94889104840865857"] | 0.497955018 |

sensor-24 | ["0.051507199150534679","0.24803904945640423","0.50397070213183581","0.75653888126010793","0.9518782718727431"] | 0.501084379 |

sensor-47 | ["0.045991246974755672","0.24644331118208851","0.48089197707088743","0.74475142784472248","0.9518322864959039"] | 0.49386228 |

sensor-135 | ["0.05132897529660399","0.24204987641954018","0.48470113942206461","0.74275730068433621","0.94784079559229406"] | 0.494817619 |

sensor-74 | ["0.048914714739047828","0.25160926036445724","0.49832498850160978","0.75257887767110776","0.94932261924236094"] | 0.501627252 |

sensor-173 | ["0.048333149363009836","0.26084250046756496","0.51288012531934613","0.74964772791583412","0.95156058795294"] | 0.505401226 |

sensor-28 | ["0.048511161184567046","0.2547387968731824","0.50101318228599656","0.75693845702682039","0.95243122486483989"] | 0.502066244 |

sensor-34 | ["0.049980293859462954","0.25094722564949412","0.50914023067384762","0.75571549713447961","0.95176564809278674"] | 0.504309494 |

... | ... | ... |

Similarly, weighted percentiles can be returned as a dynamic array using `percentilesw_array`

.

Percentiles for `percentiles_array`

and `percentilesw_array`

can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn't have to be literal.

```
CallDetailRecords
| summarize percentiles_array(Duration, dynamic([5, 25, 50, 75, 95])), avg(Duration)
```

```
TransformedSensorsData
| summarize percentiles_array(Value, range(0, 100, 5)), avg(Value) by SensorName
```

## Nearest-rank percentile

*P*-th percentile (0 < *P* <= 100) of a list of ordered values, sorted from least to greatest, is the smallest value in the list. The *P* percent of the data is less or equal to *P*-th percentile value (from Wikipedia article on percentiles).

Define *0*-th percentiles to be the smallest member of the population.

Note

Given the approximating nature of the calculation, the actual returned value may not be a member of the population.
Nearest-rank definition means that *P*=50 does not conform to the interpolative definition of the median. When evaluating the significance of this discrepancy for the specific application, the size of the population and an estimation error should be taken into account.

## Estimation error in percentiles

The percentiles aggregate provides an approximate value using T-Digest.

Note

- The bounds on the estimation error vary with the value of the requested percentile. The best accuracy is at both ends of the [0..100] scale. Percentiles 0 and 100 are the exact minimum and maximum values of the distribution. The accuracy gradually decreases towards the middle of the scale. It's worst at the median and is capped at 1%.
- Error bounds are observed on the rank, not on the value. Suppose percentile(X, 50) returned a value of Xm. The estimate guarantees that at least 49% and at most 51% of the values of X are less or equal to Xm. There is no theoretical limit on the difference between Xm and the actual median value of X.
- The estimation may sometimes result in a precise value but there are no reliable conditions to define when it will be the case.