Thanks for using MS Q&A platform and posting your query.
I'll address your questions one by one:
Should you worry about a high number of extents?
In general, a high number of extents can lead to: * Increased storage overhead due to additional metadata and index storage. * Slower performance for certain operations, like .show table [table] extents
, as you've observed. * Potential issues with query performance, especially if the extent count grows exponentially.
However, in your case, the query performance is better on Table A, which has a higher extent count. This is likely due to the efficient filtering on MetricId
using the hash partition key.
The warning threshold of 500K extents mentioned in the post you linked is not a hard limit. It's a guideline to encourage optimization and monitoring of extent growth. If your extent count continues to grow, it's essential to monitor performance and adjust your partitioning strategy as needed.
How will the extent stats change with more metrics and timestamps?
With more metrics (200K - 1M) and the same number of timestamps per metric (30M), the extent count will likely increase. However, the growth rate will depend on the distribution of data across the metrics.
If the data is evenly distributed across metrics, the extent count might not grow exponentially. On the other hand, if some metrics have a significantly larger number of timestamps, the extent count could increase more rapidly.
To mitigate this, you can consider: * Using a more granular partition key, like a combination of MetricId
and a timestamp bucket (e.g., MetricId
+ Timestamp
truncated to the hour or day level). * Implementing a data retention policy to remove older data and maintain a stable extent count.
Why can't a hash partition key be a numeric type (int)?
Hash partition keys in ADX are designed to work with string or GUID types. This is because the hash function used to distribute data across partitions is optimized for these data types.
Using a numeric type like int
as a hash partition key could lead to uneven data distribution, as the hash function might not be able to effectively spread the data across partitions. This could result in poor query performance and increased storage overhead.
If you need to use a numeric type as a partition key, consider using a range-based partition key (like Timestamp
) or a combination of columns (like MetricId
+ Timestamp
).
In conclusion, while a high extent count can be a concern, it's not necessarily a deal-breaker in your case, given the better query performance on Table A. Monitor your extent growth and adjust your partitioning strategy as needed to ensure optimal performance and storage efficiency.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.