Estimate table size

Artykuł
08/12/2024

Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer

Understanding the size of a table can be helpful for efficient resource management and optimized query performance. In this article, you'll learn different methods to estimate table sizes and how to use them effectively.

Original size of ingested data

Use the .show table details to estimate the original data size of a table. For an example, see Use .show table details.

This command provides an estimation of the uncompressed size of data ingested into your table based on the assumption that the data was transferred in CSV format. The estimation is based on approximate lengths of numeric values, such as integers, longs, datetimes, and guids, by considering their string representations.

Example use case: Track the size of incoming data over time to make informed decisions about capacity planning.

Table size in terms of access bytes

Use the estimate_data_size() along with the sum() aggregation function to estimate table size based on data types and their respective byte sizes. For an example, see Use estimate_data_size().

This method provides a more precise estimation by considering the byte sizes of numeric values without formatting them as strings. For example, integer values require 4 bytes whereas long and datetime values require 8 bytes. By using this approach, you can accurately estimate the data size that would fit in memory.

Example use case: Determine the cost of a query in terms of bytes to be scanned.

Combined size of multiple tables

You can use the union operator along with the estimate_data_size() and sum() functions to estimate the combined size of multiple tables in terms of access bytes. For an example, see Use union with estimate_data_size().

Example use case: Assess the memory requirements for consolidating data from multiple tables into a single dataset.

Note

This approach may inflate the estimated data size due to empty columns, as union combines all columns from the specified tables and estimate_data_size() takes into account empty columns when calculating the data size.

Examples

Use .show table details

The following query estimates the original data size of the StormEvents table.

Run the query

.show table StormEvents details
| project TotalOriginalSize

Output

TotalOriginalSize
60192011

Tip

To format the bytes result to MB, GB, or another unit, use format_bytes().

Use estimate_data_size()

The following query estimates the original data size of the StormEvents table in bytes.

Run the query

::: moniker-end

StormEvents
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize=sum(sizeEstimateOfColumn)

Output

totalSize
58608932

Note

The output is smaller even though the calculation is done over the same table. This is because this method provides a more precise estimation by considering the byte sizes of numeric values without formatting them as strings.

Use union with estimate_data_size()

The following query estimates the data size based for all tables in the Samples database.

Run the query

union withsource=_TableName *
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize=sum(sizeEstimateOfColumn)
| extend sizeGB = format_bytes(totalSize,2,"GB")

totalSize	sizeGB
1761782453926	1640.79 GB

Udostępnij za pośrednictwem

Estimate table size

Original size of ingested data

Table size in terms of access bytes

Combined size of multiple tables

Examples

Use .show table details

Use estimate_data_size()

Use union with estimate_data_size()

Opinia

Dodatkowe zasoby