Query performance insights

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

This page describes the performance insights that Azure Databricks returns in query history and how to act on them.

When queries run, Azure Databricks might return insights that identify opportunities to improve performance.

Find insights and recommendations for your query

Insights appear in your query history and in the query profile. The query details panel shows a summary of insights, ranked by their estimated effect on total task duration. The Performance insights tab in the query profile shows the full detail for each insight.

Optimize with Genie Code

When a query has actionable insights, select Optimize to open Genie Code. For insights that require a query change, Genie Code rewrites the query and presents the changes for your approval. For insights that involve table or compute changes, Genie Code summarizes the recommended actions as plain-language text.

To learn more about working with Genie Code, see Genie Code.

Query optimization insights

COVERAGE_FILTER_KEYS_CLUSTERING

The table is clustered by one or more keys that aren't used in filters during the table scan.

Recommendation: Add filters on the clustering keys to reduce bytes read.

COVERAGE_FILTER_KEYS_PARTITIONING

The table is partitioned by one or more keys that aren't used in filters during the table scan.

Recommendation: Add filters on the partitioning keys to reduce bytes read.

COVERAGE_PHOTON

Photon can't accelerate this operation, so the query uses the standard runtime engine.

Recommendation: Review Photon limitations and adjust the query to use a supported execution path.

EXPLODING_JOIN

The join produces significantly more rows than it reads.

Recommendation: Determine which result subset you need, then update the join condition or reduce the number of input rows from both relations.

FLOW_FULL_RECOMPUTE

The flow runs as a full recompute.

Recommendation: Rewrite the query for incremental support to reduce bytes read.

REDUNDANT_AGGREGATION

An aggregate operation did not change the query result.

Recommendation: Remove the aggregate or apply primary and foreign key constraints.

SELECTIVE_JOIN

The join produces significantly fewer rows than it reads.

Recommendation: Determine which result subset you need, then add filters before the join to reduce input rows.

WIDE_PROJECTION

The query projects all columns from the table.

Recommendation: Project only the columns you need to reduce bytes read.

Data layout insights

AUTO_LIQUID_CLUSTERING

The table is manually optimized and can benefit from automatic liquid clustering.

Recommendations:

  • Convert the table from external to managed for better performance and automatic maintenance.
  • Enable Predictive Optimization on the table for automatic maintenance operations.
  • Enable automatic clustering on the table to reduce bytes read.

CONCURRENT_WRITE

Concurrent writes to the table cause conflicts that are automatically resolved or fail.

Recommendation: Review Delta history to identify concurrent writes and adjust scheduling to avoid conflicts.

COVERAGE_STATS_DELTA

Delta data skipping statistics are missing or incomplete for the table scan file filters, so the query uses in-file filtering.

The statistics status for each filter can be one of the following:

  • Full: Statistics are available for all filters.
  • Partial: Statistics are available for a subset of filters.
  • Unavailable: Statistics are not available for any filter.
  • Unused: Statistics can't be used because the filter converts the data type.

Recommendation: Collect Delta statistics to reduce bytes read.

COVERAGE_STATS_OPTIMIZER

Cost-based optimizer statistics are missing or incomplete, so the query plan uses standard heuristics.

Recommendation: Collect statistics to enable the optimizer to produce a better plan.

DATA_SKEW

Data is distributed unevenly across computing resources.

Recommendation: Review the data distribution, then use key salting or pre-aggregation to balance the workload.

Compute and resource insights

DATA_SPILL

Data spilled to disk during query execution because the data did not fit in memory.

Recommendation: Increase the warehouse size to add memory. Reduce the number of rows, columns, or the size of large columns (strings, arrays, maps, structs) to reduce memory usage.

EXCESSIVE_QUEUE_TIME

The query waited in the warehouse queue.

Recommendation: Increase the maximum number of clusters on the warehouse to reduce queue time.

IO_THROTTLING

A cloud storage request was throttled by the cloud provider.

Recommendation: Contact your administrator to request increased storage request limits from your cloud provider.

Additional resources

For a broader overview of performance best practices, see the Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads.