预测优化系统表参考

项目
10/19/2024

重要

此系统表为公共预览版。若要访问表，必须在 system 目录中启用架构。有关详细信息，请参阅启用系统表架构。

注意

若要访问此表，必须启用 storage 架构（请参阅启用系统表架构”），并且你的区域必须支持预测优化（请参阅 Azure Databricks 区域）。

本文概述预测优化操作历史记录表架构并提供示例查询。预测优化可优化数据布局，实现最佳性能和成本效益。系统表跟踪此功能的操作历史记录。有关预测优化的信息，请参阅 Unity Catalog 托管表的预测性优化。

表路径：此系统表位于 system.storage.predictive_optimization_operations_history.

交付注意事项

数据最多可能需要 24 小时才能填充。
预测优化可能在同一群集上运行多个操作。如果是这样，则表示每个操作归因于每个操作的 DBU 共享是近似的。这就是为什么 usage_unit 设置为 ESTIMATED_DBU 的原因。不过，在群集上使用的 DBU 总数是准确的。

预测优化表架构

预测优化操作历史记录系统表使用以下架构：

列名称	数据类型	说明	示例
`account_id`	string	帐户的 ID。	`11e22ba4-87b9-4cc2-9770-d10b894b7118`
`workspace_id`	string	预测优化在其中运行操作的工作区的 ID。	`1234567890123456`
`start_time`	timestamp	操作开始的时间。时区信息记录在值的末尾，其中 `+00:00` 表示 UTC。	`2023-01-09 10:00:00.000+00:00`
`end_time`	timestamp	操作结束的时间。时区信息记录在值的末尾，其中 `+00:00` 表示 UTC。	`2023-01-09 11:00:00.000+00:00`
`metastore_name`	string	优化表所属的元存储的名称。	`metastore`
`catalog_name`	string	优化表所属的目录的名称。	`catalog`
`schema_name`	string	优化表所属的架构的名称。	`schema`
`table_id`	string	优化表的 ID。	`138ebb4b-3757-41bb-9e18-52b38d3d2836`
`table_name`	string	优化表的名称。	`table1`
`operation_type`	string	执行的优化操作。该值为`COMPACTION`或 `VACUUMCLUSTERING`。	`COMPACTION`
`operation_id`	string	优化操作的 ID。	`4dad1136-6a8f-418f-8234-6855cfaff18f`
`operation_status`	string	优化操作的状态。值为 `SUCCESSFUL` 或 `FAILED: INTERNAL_ERROR`。	`SUCCESSFUL`
`operation_metrics`	map[string, string]	有关执行的特定优化的其他详细信息。请参阅操作指标。	`{"number_of_output_files":"100","number_of_compacted_files":"1000","amount_of_output_data_bytes":"4000","amount_of_data_compacted_bytes":"10000"}`
`usage_unit`	string	此操作发生的使用单位。只能是一个值：`ESTIMATED_DBU`。	`ESTIMATED_DBU`
`usage_quantity`	decimal	此操作使用的使用单位量。	`2.12`

操作指标

列中记录 operation_metrics 的指标因操作类型而异：

COMPACTION：number_of_compacted_files、、number_of_output_filesamount_of_data_compacted_bytes、、amount_of_output_data_bytes
VACUUM： number_of_deleted_files， amount_of_data_deleted_bytes
CLUSTERING：number_of_removed_files、、amount_of_data_removed_bytesnumber_of_clustered_files、、amount_of_clustered_data_bytes

示例查询

以下部分包括可用于深入了解预测优化系统表的示例查询。为使这些查询正常工作，需要将大括号 {{}} 中的值替换为你自己的参数。

本文包括以下示例查询：

过去 30 天内有多少 DBU 使用了预测优化？
预测优化过去 30 天内在哪些表上使用最多？
预测优化在哪些表上执行的操作最多？
对于给定目录，总字节数压缩了多少？
哪些表清空的字节最多？
预测优化运行的操作的成功率是多少？

估计过去 30 天内有多少 DBU 使用了预测优化？

SELECT SUM(usage_quantity)
FROM system.storage.predictive_optimization_operations_history
WHERE
     usage_unit = "ESTIMATED_DBU"
     AND  timestampdiff(day, start_time, Now()) < 30

过去 30 天内哪些表的预测优化开支最多（估计成本）？

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(usage_quantity) as totalDbus
FROM system.storage.predictive_optimization_operations_history
WHERE
    usage_unit = "ESTIMATED_DBU"
    AND timestampdiff(day, start_time, Now()) < 30
GROUP BY ALL
ORDER BY totalDbus DESC

预测优化在哪些表上执行的操作最多？

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     operation_type,
     COUNT(DISTINCT operation_id) as operations
FROM system.storage.predictive_optimization_operations_history
GROUP BY ALL
ORDER BY operations DESC

对于给定目录，总字节数压缩了多少？

SELECT
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_compacted_bytes"]) as bytesCompacted
FROM system.storage.predictive_optimization_operations_history
WHERE
    metastore_name = {{metastore_name}}
    AND catalog_name = {{catalog_name}}
    AND operation_type = "COMPACTION"
GROUP BY ALL
ORDER BY bytesCompacted DESC

哪些表清空的字节最多？

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_deleted_bytes"]) as bytesVacuumed
FROM system.storage.predictive_optimization_operations_history
WHERE operation_type = "VACUUM"
GROUP BY ALL
ORDER BY bytesVacuumed DESC

预测优化运行的操作的成功率是多少？

WITH operation_counts AS (
     SELECT
           COUNT(DISTINCT (CASE WHEN operation_status = "SUCCESSFUL" THEN operation_id END)) as successes,
           COUNT(DISTINCT operation_id) as total_operations
    FROM system.storage.predictive_optimization_operations_history
 )
SELECT successes / total_operations as success_rate
FROM operation_counts

通过