Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
This page describes the performance insights that Azure Databricks returns in query history and how to act on them.
When queries run, Azure Databricks might return insights that identify opportunities to improve performance.
Find insights and recommendations for your query
Insights appear in your query history and in the query profile. The query details panel shows a summary of insights, ranked by their estimated effect on total task duration. The Performance insights tab in the query profile shows the full detail for each insight.
Optimize with Genie Code
When a query has actionable insights, select Optimize to open Genie Code. For insights that require a query change, Genie Code rewrites the query and presents the changes for your approval. For insights that involve table or compute changes, Genie Code summarizes the recommended actions as plain-language text.
To learn more about working with Genie Code, see Genie Code.
Query optimization insights
COVERAGE_FILTER_KEYS_CLUSTERING
The table is clustered by one or more keys that aren't used in filters during the table scan.
Recommendation: Add filters on the clustering keys to reduce bytes read.
COVERAGE_FILTER_KEYS_PARTITIONING
The table is partitioned by one or more keys that aren't used in filters during the table scan.
Recommendation: Add filters on the partitioning keys to reduce bytes read.
COVERAGE_PHOTON
Photon can't accelerate this operation, so the query uses the standard runtime engine.
Recommendation: Review Photon limitations and adjust the query to use a supported execution path.
EXPLODING_JOIN
The join produces significantly more rows than it reads.
Recommendation: Determine which result subset you need, then update the join condition or reduce the number of input rows from both relations.
FLOW_FULL_RECOMPUTE
The flow runs as a full recompute.
Recommendation: Rewrite the query for incremental support to reduce bytes read.
REDUNDANT_AGGREGATION
An aggregate operation did not change the query result.
Recommendation: Remove the aggregate or apply primary and foreign key constraints.
SELECTIVE_JOIN
The join produces significantly fewer rows than it reads.
Recommendation: Determine which result subset you need, then add filters before the join to reduce input rows.
WIDE_PROJECTION
The query projects all columns from the table.
Recommendation: Project only the columns you need to reduce bytes read.
Data layout insights
AUTO_LIQUID_CLUSTERING
The table is manually optimized and can benefit from automatic liquid clustering.
Recommendations:
- Convert the table from external to managed for better performance and automatic maintenance.
- Enable Predictive Optimization on the table for automatic maintenance operations.
- Enable automatic clustering on the table to reduce bytes read.
CONCURRENT_WRITE
Concurrent writes to the table cause conflicts that are automatically resolved or fail.
Recommendation: Review Delta history to identify concurrent writes and adjust scheduling to avoid conflicts.
COVERAGE_STATS_DELTA
Delta data skipping statistics are missing or incomplete for the table scan file filters, so the query uses in-file filtering.
The statistics status for each filter can be one of the following:
- Full: Statistics are available for all filters.
- Partial: Statistics are available for a subset of filters.
- Unavailable: Statistics are not available for any filter.
- Unused: Statistics can't be used because the filter converts the data type.
Recommendation: Collect Delta statistics to reduce bytes read.
COVERAGE_STATS_OPTIMIZER
Cost-based optimizer statistics are missing or incomplete, so the query plan uses standard heuristics.
Recommendation: Collect statistics to enable the optimizer to produce a better plan.
DATA_SKEW
Data is distributed unevenly across computing resources.
Recommendation: Review the data distribution, then use key salting or pre-aggregation to balance the workload.
Compute and resource insights
DATA_SPILL
Data spilled to disk during query execution because the data did not fit in memory.
Recommendation: Increase the warehouse size to add memory. Reduce the number of rows, columns, or the size of large columns (strings, arrays, maps, structs) to reduce memory usage.
EXCESSIVE_QUEUE_TIME
The query waited in the warehouse queue.
Recommendation: Increase the maximum number of clusters on the warehouse to reduce queue time.
IO_THROTTLING
A cloud storage request was throttled by the cloud provider.
Recommendation: Contact your administrator to request increased storage request limits from your cloud provider.
Additional resources
For a broader overview of performance best practices, see the Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads.