materialize()
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel
Captures the value of a tabular expression for the duration of the query execution so that it can be referenced multiple times by the query without recalculation.
Syntax
materialize(
expression)
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
expression | string |
✔️ | The tabular expression to be evaluated and cached during query execution. |
Remarks
The materialize()
function is useful in the following scenarios:
- To speed up queries that perform heavy calculations whose results are used multiple times in the query.
- To evaluate a tabular expression only once and use it many times in a query. This is commonly required if the tabular expression is non-deterministic. For example, if the expression uses the
rand()
or thedcount()
functions.
Note
Materialize has a cache size limit of 5 GB. This limit is per cluster node and is mutual for all queries running concurrently. If a query uses materialize()
and the cache can't hold any more data, the query will abort with an error.
Tip
Another way to perform materialization of tabular expression is by using the hint.materialized
flag
of the as operator and partition operator. They all share a
single materialization cache.
Tip
- Push all possible operators that reduce the materialized dataset and keep the semantics of the query. For example, use common filters on top of the same materialized expression.
- Use materialize with join or union when their operands have mutual subqueries that can be executed once. For example, join/union fork legs. See example of using join operator.
- Materialize can only be used in let statements if you give the cached result a name. See example of using let statements).
Examples of query performance improvement
The following example shows how materialize()
can be used to improve performance of the query.
The expression _detailed_data
is defined using materialize()
function and therefore is calculated only once.
let _detailed_data = materialize(StormEvents | summarize Events=count() by State, EventType);
_detailed_data
| summarize TotalStateEvents=sum(Events) by State
| join (_detailed_data) on State
| extend EventPercentage = Events*100.0 / TotalStateEvents
| project State, EventType, EventPercentage, Events
| top 10 by EventPercentage
Output
State | EventType | EventPercentage | Events |
---|---|---|---|
HAWAII WATERS | Waterspout | 100 | 2 |
LAKE ONTARIO | Marine Thunderstorm Wind | 100 | 8 |
GULF OF ALASKA | Waterspout | 100 | 4 |
ATLANTIC NORTH | Marine Thunderstorm Wind | 95.2127659574468 | 179 |
LAKE ERIE | Marine Thunderstorm Wind | 92.5925925925926 | 25 |
E PACIFIC | Waterspout | 90 | 9 |
LAKE MICHIGAN | Marine Thunderstorm Wind | 85.1648351648352 | 155 |
LAKE HURON | Marine Thunderstorm Wind | 79.3650793650794 | 50 |
GULF OF MEXICO | Marine Thunderstorm Wind | 71.7504332755633 | 414 |
HAWAII | High Surf | 70.0218818380744 | 320 |
The following example generates a set of random numbers and calculates:
- How many distinct values in the set (
Dcount
) - The top three values in the set
- The sum of all these values in the set
This operation can be done using batches and materialize:
let randomSet =
materialize(
range x from 1 to 3000000 step 1
| project value = rand(10000000));
randomSet | summarize Dcount=dcount(value);
randomSet | top 3 by value;
randomSet | summarize Sum=sum(value)
Result set 1:
Dcount |
---|
2578351 |
Result set 2:
value |
---|
9999998 |
9999998 |
9999997 |
Result set 3:
Sum |
---|
15002960543563 |
Examples of using materialize()
Tip
Materialize your column at ingestion time if most of your queries extract fields from dynamic objects across millions of rows.
To use the let
statement with a value that you use more than once, use the materialize() function. Try to push all possible operators that will reduce the materialized dataset and still keep the semantics of the query. For example, use filters, or project only required columns.
let materializedData = materialize(Table
| where Timestamp > ago(1d));
union (materializedData
| where Text !has "somestring"
| summarize dcount(Resource1)), (materializedData
| where Text !has "somestring"
| summarize dcount(Resource2))
The filter on Text
is mutual and can be pushed to the materialize expression.
The query only needs columns Timestamp
, Text
, Resource1
, and Resource2
. Project these columns inside the materialized expression.
let materializedData = materialize(Table
| where Timestamp > ago(1d)
| where Text !has "somestring"
| project Timestamp, Resource1, Resource2, Text);
union (materializedData
| summarize dcount(Resource1)), (materializedData
| summarize dcount(Resource2))
If the filters aren't identical, as in the following query:
let materializedData = materialize(Table
| where Timestamp > ago(1d));
union (materializedData
| where Text has "String1"
| summarize dcount(Resource1)), (materializedData
| where Text has "String2"
| summarize dcount(Resource2))
When the combined filter reduces the materialized result drastically, combine both filters on the materialized result by a logical or
expression as in the following query. However, keep the filters in each union leg to preserve the semantics of the query.
let materializedData = materialize(Table
| where Timestamp > ago(1d)
| where Text has "String1" or Text has "String2"
| project Timestamp, Resource1, Resource2, Text);
union (materializedData
| where Text has "String1"
| summarize dcount(Resource1)), (materializedData
| where Text has "String2"
| summarize dcount(Resource2))