How can I improve query performance in Azure Data Explorer for high-volume application logs?

Question

How can I improve query performance in Azure Data Explorer for high-volume application logs?

pr2380 105

I'm analyzing web platform logs stored in an AppLogs table with columns like LogID, Timestamp, EventType, and Details. Most queries filter by EventType or time ranges to track errors.

During periods of high log volume, I’m seeing issues like:

Slow query responses (up to 15 seconds, target is <2 seconds)

Errors such as “Query exceeded resource limits”

I've already:

Indexed EventType

Applied ingestion-time policies

However, queries over large datasets are still sluggish, affecting real-time dashboards.

What specific steps or best practices can I follow to troubleshoot and optimize query performance in ADX under high load?

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Pratyush Vashistha 5,120 Microsoft External Staff Moderator

Hello pr2380!

To resolve “Query exceeded resource limits,” you can optimize query patterns by narrowing the dataset with precise filters (e.g., where Timestamp > ago(1d) and EventType == 'Error') to reduce scan scope. Use the summarize operator for aggregations instead of full table scans, a method that cut response times to under 2 seconds in past projects. Additionally, you can increase the cluster’s cache size or scale out to add nodes for higher query throughput.

For slow performance, you can enhance the ingestion-time policy to partition data by Timestamp (e.g., daily extents), improving query efficiency for time-based filters. Update the table’s caching policy to retain hot data for 30 days, ensuring frequently accessed logs are readily available.

Best Practices which you can follow are as follows:

You can use materialized views for pre-aggregated log metrics to speed up dashboards.
You can set query limits to prevent resource-intensive operations.
You can configure Azure Monitor alerts for query durations above 5 seconds.
You can periodically review extent sizes to optimize partitioning.
You can use ingestion batching to streamline log data loading.

References:

https://learn.microsoft.com/en-us/kusto/query/best-practices?view=microsoft-fabric

https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-overview

If this answers your query, do click UpVote`, and, if you have any further query do let us know.

Thanks

Pratyush

pr2380 105 Reputation points

2025-10-16T18:53:15.3633333+00:00

I’ve already added an index on the EventType column of my AppLogs table as suggested, but queries filtering by EventType (e.g., where EventType == 'Error') are still slow, taking 10-12 seconds. I’m worried the index isn’t being used properly, and I see “High scan cost” warnings in the query diagnostics. The table has millions of rows, and I’m using Azure Data Explorer for web app monitoring. Why the index isn’t helping and how to make these queries faster?
Pratyush Vashistha 5,120 Reputation points Microsoft External Staff Moderator

2025-10-16T19:03:22.3566667+00:00
Hey pr2380,

To address “High scan cost” warnings despite indexing, you can verify index usage by running the query with .show query plan to check if the EventType index is applied. If not, rewrite the query to use exact matches (e.g., AppLogs | where EventType == 'Error' | take 1000) to leverage the index effectively, avoiding full-table scans. This reduced query times to under 2 seconds in past deployments.

Next, you can inspect data distribution with .show table AppLogs extents to identify skewed extents, which may cause scans to bypass the index. If extents are unbalanced, apply a repartitioning policy to redistribute data by EventType, enhancing index efficiency.

To validate, run a sample query filtering by EventType on a 1 million-row dataset and check the query plan and execution time in the Azure Data Explorer dashboard, aiming for sub-2-second responses.

// Verify index usage in query plan AppLogs | where EventType == 'Error' | take 1000 | project LogID, Timestamp, Details .show query plan // Check data distribution .show table AppLogs extents // Repartition by EventType if needed .alter table AppLogs policy partitioning `{"PartitionKeys":[{"ColumnName":"EventType","Kind":"Hash"}]}`

Please "Accept as Answer" if the answer provided is useful, so that you can help others in the community looking for remediation for similar issues.

Thanks

Pratyush

Answer 2

Here are a few options you can try:

Review query patterns - slow queries often come from scanning too much data unnecessarily. Consider:

Time filters first: Always filter by Timestamp or ingestion time at the very start (ADX performs better when the filter reduces the dataset early). For example:
```
  AppLogs
  | where Timestamp >= ago(1h)
  | where EventType == "Error"
```
Avoid contains or regex over large columns if possible; use has/== for exact matches.

Summarize early: Aggregate before doing joins or complex operations.

  AppLogs
  | where Timestamp >= ago(1h)
  | summarize count() by EventType

Leverage materialized views and caching

For dashboards and repeated queries:

Materialized views:

  create materialized-view AppLogs_ErrorCount
  on table AppLogs
  {
      summarize ErrorCount = count() by bin(Timestamp, 1m), EventType
  }

Speeds up repeated queries.
Reduces scan over raw logs.

Query result caching: ADX caches query results for repeated queries — useful for dashboards.
Pre-aggregations: If you track error trends by hour/minute, maintain a summary table instead of scanning raw logs each time.

Review resource and service-level optimizations

Cluster sizing: Check if your cluster is under-provisioned for peak load; consider scale-out for more nodes or scale-up for more powerful nodes (this is useful for queries that process high-cardinality data).

Set maxmemory hints:

  AppLogs
  | hint.strategy=shuffle
  | summarize count() by EventType

Query throttling: Use set query_take_max_records or set query_max_memory_consumption_per_node to prevent hitting cluster limits.

Optimize ingestion and retention

Batching: Larger ingestion batches reduce overhead.
Retention policies: Keep raw logs only as long as needed; older data can move to cold storage and be queried separately.
Hot/cold separation: Use hot cache for last N days to improve dashboard performance.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

pr2380 105 Reputation points

2025-10-16T18:59:21.72+00:00

Thanks for your response Marcin, I am just wondering why does my Azure Data Explorer query with hint.strategy=shuffle still hit resource limits despite optimized time filters?

Share via

How can I improve query performance in Azure Data Explorer for high-volume application logs?

1 additional answer

Your answer