Predictive optimization system table reference
Important
This system table is in Public Preview. To access the table, the schema must be enabled in your system
catalog. For more information, see Enable system table schemas.
Note
To have access to this table, you must enable the storage
schema (see Enable system table schemas) and your region must support predictive optimization (see Azure Databricks regions).
This article outlines the predictive optimization operation history table schema and provides sample queries. Predictive optimization optimizes your data layout for peak performance and cost efficiency. The system table tracks the operation history of this feature. For information on predictive optimization, see Predictive optimization for Unity Catalog managed tables.
Table path: This system table is located at system.storage.predictive_optimization_operations_history
.
Delivery considerations
- Data can take up to 24 hours to be populated.
- Predictive optimization might run multiple operations on the same cluster. If so, the share of DBUs attributed to each of the multiple operations is approximated. This is why the
usage_unit
is set toESTIMATED_DBU
. Still, the total number of DBUs spent on the cluster will be accurate.
Predictive optimization table schema
The predictive optimization operation history system table uses the following schema:
Column name | Data type | Description | Example |
---|---|---|---|
account_id |
string | ID of the account. | 11e22ba4-87b9-4cc2-9770-d10b894b7118 |
workspace_id |
string | The ID of the workspace in which predictive optimization ran the operation. | 1234567890123456 |
start_time |
timestamp | The time at which the operation started. Timezone information is recorded at the end of the value with +00:00 representing UTC. |
2023-01-09 10:00:00.000+00:00 |
end_time |
timestamp | The time at which the operation ended. Timezone information is recorded at the end of the value with +00:00 representing UTC. |
2023-01-09 11:00:00.000+00:00 |
metastore_name |
string | The name of the metastore to which the optimized table belongs. | metastore |
catalog_name |
string | The name of the catalog to which the optimized table belongs. | catalog |
schema_name |
string | The name of the schema to which the optimized table belongs. | schema |
table_id |
string | The ID of the optimized table. | 138ebb4b-3757-41bb-9e18-52b38d3d2836 |
table_name |
string | The name of the optimized table. | table1 |
operation_type |
string | The optimization operation which was performed. The value will be COMPACTION , VACUUM , or CLUSTERING . |
COMPACTION |
operation_id |
string | The ID for the optimization operation. | 4dad1136-6a8f-418f-8234-6855cfaff18f |
operation_status |
string | The status of the optimization operation. The value will be SUCCESSFUL or FAILED: INTERNAL_ERROR . |
SUCCESSFUL |
operation_metrics |
map[string, string] | Additional details about the specific optimization that was performed. See Operation metrics. | {"number_of_output_files":"100","number_of_compacted_files":"1000","amount_of_output_data_bytes":"4000","amount_of_data_compacted_bytes":"10000"} |
usage_unit |
string | The unit of usage that this operation incurred. Can only be one value: ESTIMATED_DBU . |
ESTIMATED_DBU |
usage_quantity |
decimal | The amount of the usage unit that was used by this operation. | 2.12 |
Operation metrics
The metrics recorded in the operation_metrics
column vary depending on the operation type:
COMPACTION
:number_of_compacted_files
,amount_of_data_compacted_bytes
,number_of_output_files
,amount_of_output_data_bytes
VACUUM
:number_of_deleted_files
,amount_of_data_deleted_bytes
CLUSTERING
:number_of_removed_files
,number_of_clustered_files
,amount_of_data_removed_bytes
,amount_of_clustered_data_bytes
Example queries
The following sections include sample queries you can use to gain insights into the predictive optimization system table. For these queries to work, you need to replace the values within curly brackets {{}}
with your own parameters.
This article includes the following example queries:
- How many DBUs has predictive optimization used in the last 30 days?
- On which tables did predictive optimization spend the most in the last 30 days?
- On which tables is predictive optimization performing the most operations?
- For a given catalog, how many total bytes have been compacted?
- What tables had the most bytes vacuumed?
- What is the success rate for operations run by predictive optimizations?
How many estimated DBUs has predictive optimization used in the last 30 days?
SELECT SUM(usage_quantity)
FROM system.storage.predictive_optimization_operations_history
WHERE
usage_unit = "ESTIMATED_DBU"
AND timestampdiff(day, start_time, Now()) < 30
On which tables did predictive optimization spend the most in the last 30 days (estimated cost)?
SELECT
metastore_name,
catalog_name,
schema_name,
table_name,
SUM(usage_quantity) as totalDbus
FROM system.storage.predictive_optimization_operations_history
WHERE
usage_unit = "ESTIMATED_DBU"
AND timestampdiff(day, start_time, Now()) < 30
GROUP BY ALL
ORDER BY totalDbus DESC
On which tables is predictive optimization performing the most operations?
SELECT
metastore_name,
catalog_name,
schema_name,
table_name,
operation_type,
COUNT(DISTINCT operation_id) as operations
FROM system.storage.predictive_optimization_operations_history
GROUP BY ALL
ORDER BY operations DESC
For a given catalog, how many total bytes have been compacted?
SELECT
schema_name,
table_name,
SUM(operation_metrics["amount_of_data_compacted_bytes"]) as bytesCompacted
FROM system.storage.predictive_optimization_operations_history
WHERE
metastore_name = {{metastore_name}}
AND catalog_name = {{catalog_name}}
AND operation_type = "COMPACTION"
GROUP BY ALL
ORDER BY bytesCompacted DESC
What tables had the most bytes vacuumed?
SELECT
metastore_name,
catalog_name,
schema_name,
table_name,
SUM(operation_metrics["amount_of_data_deleted_bytes"]) as bytesVacuumed
FROM system.storage.predictive_optimization_operations_history
WHERE operation_type = "VACUUM"
GROUP BY ALL
ORDER BY bytesVacuumed DESC
What is the success rate for operations run by predictive optimizations?
WITH operation_counts AS (
SELECT
COUNT(DISTINCT (CASE WHEN operation_status = "SUCCESSFUL" THEN operation_id END)) as successes,
COUNT(DISTINCT operation_id) as total_operations
FROM system.storage.predictive_optimization_operations_history
)
SELECT successes / total_operations as success_rate
FROM operation_counts