top-nested operator

Produces a hierarchical aggregation and top values selection, where each level is a refinement of the previous one.

T | top-nested 3 of Location with others="Others" by sum(MachinesNumber), top-nested 4 of bin(Timestamp,5m) by sum(MachinesNumber)

The top-nested operator accepts tabular data as input, and one or more aggregation clauses. The first aggregation clause (left-most) subdivides the input records into partitions, according to the unique values of some expression over those records. The clause then keeps a certain number of records that maximize or minimize this expression over the records. The next aggregation clause then applies a similar function, in a nested fashion. Each following clause is applied to the partition produced by the previous clause. This process continues for all aggregation clauses.

For example, the top-nested operator can be used to answer the following question: "For a table containing sales figures, such as country/region, salesperson, and amount sold: what are the top five countries/regions by sales? What are the top three salespeople in each of these countries/regions?"

Syntax

T | top-nested TopNestedClause [, top-nested TopNestedClause2]...

Where TopNestedClause has the following syntax:

[ N ] of [ExprName =] Expr [with others = ConstExpr] by [AggName =] Aggregation [asc | desc]

Parameters

Name Type Required Description
T string The input tabular expression.
N long The number of top values to return for this hierarchy level. If omitted, all distinct values will be returned.
ExprName string If specified, sets the name of the output column corresponding to the values of Expr.
Expr string An expression over the input record indicating which value to return for this hierarchy level. Typically it's a column reference from T, or some calculation, such as bin(), over such a column.
ConstExpr string If specified, for each hierarchy level, 1 record will be added with the value that is the aggregation over all records that didn't "make it to the top".
AggName string If specified, this identifier sets the column name in the output for the value of Aggregation.
Aggregation string The aggregation function to apply to all records sharing the same value of Expr. The value of this aggregation determines which of the resulting records are "top". For the possible values, see supported aggregation functions.
asc or desc string Controls whether selection is actually from the "bottom" or "top" of the range of aggregated values. The default is desc.

Supported aggregation functions

The following aggregation functions are supported:

Note

Any algebraic combination of the aggregations is also supported.

Returns

This operator returns a table that has two columns for each aggregation clause:

  • One column holds the distinct values of the clause's Expr calculation (having the column name ExprName if specified)

  • One column holds the result of the Aggregation calculation (having the column name AggregationName if specified)

Notes

Input columns that aren't specified as Expr values aren't outputted. To get all values at a certain level, add an aggregation count that:

  • Omits the value of N
  • Uses the column name as the value of Expr
  • Uses Ignore=max(1) as the aggregation, and then ignore (or project-away) the column Ignore.

The number of records may grow exponentially with the number of aggregation clauses ((N1+1) * (N2+1) * ...). Record growth is even faster if no N limit is specified. Take into account that this operator may consume a considerable amount of resources.

If the distribution of the aggregation is considerably non-uniform, limit the number of distinct values to return (by using N) and use the with others= ConstExpr option to get an indication for the "weight" of all other cases.

Examples

StormEvents
| top-nested 2 of State       by sum(BeginLat),
  top-nested 3 of Source      by sum(BeginLat),
  top-nested 1 of EndLocation by sum(BeginLat)

Output

State aggregated_State Source aggregated_Source EndLocation aggregated_EndLocation
KANSAS 87771.2355000001 Law Enforcement 18744.823 FT SCOTT 264.858
KANSAS 87771.2355000001 Public 22855.6206 BUCKLIN 488.2457
KANSAS 87771.2355000001 Trained Spotter 21279.7083 SHARON SPGS 388.7404
TEXAS 123400.5101 Public 13650.9079 AMARILLO 246.2598
TEXAS 123400.5101 Law Enforcement 37228.5966 PERRYTON 289.3178
TEXAS 123400.5101 Trained Spotter 13997.7124 CLAUDE 421.44

Use the option 'with others':

StormEvents
| top-nested 2 of State with others = "All Other States" by sum(BeginLat),
  top-nested 3 of Source by sum(BeginLat),
  top-nested 1 of EndLocation with others = "All Other End Locations" by sum(BeginLat)

Output

State aggregated_State Source aggregated_Source EndLocation aggregated_EndLocation
KANSAS 87771.2355000001 Law Enforcement 18744.823 FT SCOTT 264.858
KANSAS 87771.2355000001 Public 22855.6206 BUCKLIN 488.2457
KANSAS 87771.2355000001 Trained Spotter 21279.7083 SHARON SPGS 388.7404
TEXAS 123400.5101 Public 13650.9079 AMARILLO 246.2598
TEXAS 123400.5101 Law Enforcement 37228.5966 PERRYTON 289.3178
TEXAS 123400.5101 Trained Spotter 13997.7124 CLAUDE 421.44
KANSAS 87771.2355000001 Law Enforcement 18744.823 All Other End Locations 18479.965
KANSAS 87771.2355000001 Public 22855.6206 All Other End Locations 22367.3749
KANSAS 87771.2355000001 Trained Spotter 21279.7083 All Other End Locations 20890.9679
TEXAS 123400.5101 Public 13650.9079 All Other End Locations 13404.6481
TEXAS 123400.5101 Law Enforcement 37228.5966 All Other End Locations 36939.2788
TEXAS 123400.5101 Trained Spotter 13997.7124 All Other End Locations 13576.2724
KANSAS 87771.2355000001 All Other End Locations 24891.0836
TEXAS 123400.5101 All Other End Locations 58523.2932000001
All Other States 1149279.5923 All Other End Locations 1149279.5923

The following query shows the same results for the first level used in the example above.

StormEvents
| where State !in ('TEXAS', 'KANSAS')
| summarize sum(BeginLat)

Output

sum_BeginLat
1149279.5923

Request another column (EventType) to the top-nested result.

StormEvents
| top-nested 2 of State       by sum(BeginLat),
  top-nested 2 of Source      by sum(BeginLat),
  top-nested 1 of EndLocation by sum(BeginLat),
  top-nested   of EventType   by tmp = max(1)
| project-away tmp

Output

State aggregated_State Source aggregated_Source EndLocation aggregated_EndLocation EventType
KANSAS 87771.2355000001 Trained Spotter 21279.7083 SHARON SPGS 388.7404 Thunderstorm Wind
KANSAS 87771.2355000001 Trained Spotter 21279.7083 SHARON SPGS 388.7404 Hail
KANSAS 87771.2355000001 Trained Spotter 21279.7083 SHARON SPGS 388.7404 Tornado
KANSAS 87771.2355000001 Public 22855.6206 BUCKLIN 488.2457 Hail
KANSAS 87771.2355000001 Public 22855.6206 BUCKLIN 488.2457 Thunderstorm Wind
KANSAS 87771.2355000001 Public 22855.6206 BUCKLIN 488.2457 Flood
TEXAS 123400.5101 Trained Spotter 13997.7124 CLAUDE 421.44 Hail
TEXAS 123400.5101 Law Enforcement 37228.5966 PERRYTON 289.3178 Hail
TEXAS 123400.5101 Law Enforcement 37228.5966 PERRYTON 289.3178 Flood
TEXAS 123400.5101 Law Enforcement 37228.5966 PERRYTON 289.3178 Flash Flood

Give an index sort order for each value in this level (per group) to sort the result by the last nested level (in this example by EndLocation):

StormEvents
| top-nested 2 of State  by sum(BeginLat),    top-nested 2 of Source by sum(BeginLat),    top-nested 4 of EndLocation by  sum(BeginLat)
| order by State , Source, aggregated_EndLocation
| summarize EndLocations = make_list(EndLocation, 10000) , endLocationSums = make_list(aggregated_EndLocation, 10000) by State, Source
| extend indicies = range(0, array_length(EndLocations) - 1, 1)
| mv-expand EndLocations, endLocationSums, indicies

Output

State Source EndLocations endLocationSums indices
TEXAS Trained Spotter CLAUDE 421.44 0
TEXAS Trained Spotter AMARILLO 316.8892 1
TEXAS Trained Spotter DALHART 252.6186 2
TEXAS Trained Spotter PERRYTON 216.7826 3
TEXAS Law Enforcement PERRYTON 289.3178 0
TEXAS Law Enforcement LEAKEY 267.9825 1
TEXAS Law Enforcement BRACKETTVILLE 264.3483 2
TEXAS Law Enforcement GILMER 261.9068 3
KANSAS Trained Spotter SHARON SPGS 388.7404 0
KANSAS Trained Spotter ATWOOD 358.6136 1
KANSAS Trained Spotter LENORA 317.0718 2
KANSAS Trained Spotter SCOTT CITY 307.84 3
KANSAS Public BUCKLIN 488.2457 0
KANSAS Public ASHLAND 446.4218 1
KANSAS Public PROTECTION 446.11 2
KANSAS Public MEADE STATE PARK 371.1 3

The following example returns the two most-recent events for each US state, with some information per event. Note the use of the max(1) (which is then projected away) for columns which just require propagation through the operator without any selection logic.

StormEvents
| top-nested of State by Ignore0=max(1),
  top-nested 2 of StartTime by Ignore1=max(StartTime),
  top-nested of EndTime by Ignore2=max(1),
  top-nested of EpisodeId by Ignore3=max(1)
| project-away Ignore*
| order by State asc, StartTime desc

Retrieve the latest records per identity

If you have a table with an ID column and a timestamp column, you can use the top-nested operator to query the latest two records for each unique value of ID. The latest records are defined by the highest value of timestamp.

datatable(id: string, timestamp: datetime, otherInformation: string)   
[
    "Barak", datetime(2015-01-01), "1",
    "Barak", datetime(2016-01-01), "2",
    "Barak", datetime(2017-01-20), "3",
    "Donald", datetime(2017-01-20), "4",
    "Donald", datetime(2017-01-18), "5",
    "Donald", datetime(2017-01-19), "6"
]
| top-nested of id by dummy0=max(1),  
top-nested 2 of timestamp by dummy1=max(timestamp),  
top-nested of otherInformation by dummy2=max(1)
| project-away dummy0, dummy1, dummy2 

Output

id timestamp otherInformation
Barak 2016-01-01T00:00:00Z 2
Donald 2017-01-19T00:00:00Z 6
Barak 2017-01-20T00:00:00Z 3
Donald 2017-01-20T00:00:00Z 4

Here's a step-by-step explanation of the query:

  1. The datatable creates a test dataset.
  2. The first top-nested clause returns all distinct values of id.
  3. The second top-nested clause selects the top two records with the highest timestamp for each id.
  4. The third top-nested clause adds the otherInformation column for each record.
  5. The project-away operator removes the dummy columns introduced by the top-nested operator.