分享方式:


檢查專用 SQL 集區的統計數據精確度

適用於:Azure Synapse Analytics

更新的統計數據對於產生最佳執行計劃至關重要。 您應該評估兩個不同的檢視方塊,因為其與判斷統計數據精確度有關:

步驟 1:確認控制節點數據列計數正確性

專用 SQL 集區中,建立分散式查詢計劃的主要引擎必須持續更新計算節點上的數據列數目。 執行下列查詢,以識別數據列計數之間有差異的數據表:

SELECT objIdsWithStats.[object_id]
    ,actualRowCounts.[schema]
    ,actualRowCounts.logical_table_name
    ,statsRowCounts.stats_row_count
    ,actualRowCounts.actual_row_count
    ,row_count_difference = CASE 
        WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
            THEN actualRowCounts.actual_row_count - statsRowCounts.stats_row_count
        ELSE statsRowCounts.stats_row_count - actualRowCounts.actual_row_count
        END
    ,percent_deviation_from_actual = CASE 
        WHEN actualRowCounts.actual_row_count = 0
            THEN statsRowCounts.stats_row_count
        WHEN statsRowCounts.stats_row_count = 0
            THEN actualRowCounts.actual_row_count
        WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count
            THEN CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (actualRowCounts.actual_row_count - statsRowCounts.stats_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
        ELSE CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (statsRowCounts.stats_row_count - actualRowCounts.actual_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
        END
    ,'UPDATE STATISTICS ' + quotename(actualRowCounts.[schema]) + '.' + quotename(actualRowCounts.logical_table_name) + ';' as update_stats_stmt
FROM (
    SELECT DISTINCT object_id
    FROM sys.stats
    WHERE stats_id > 1
    ) objIdsWithStats
LEFT JOIN (
    SELECT object_id
        ,sum(rows) AS stats_row_count
    FROM sys.partitions
    GROUP BY object_id
    ) statsRowCounts ON objIdsWithStats.object_id = statsRowCounts.object_id
LEFT JOIN (
    SELECT sm.name [schema]
        ,tb.name logical_table_name
        ,tb.object_id object_id
        ,SUM(rg.row_count) actual_row_count
    FROM sys.schemas sm
    INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
    INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
    INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
    INNER JOIN sys.dm_pdw_nodes_db_partition_stats rg ON rg.object_id = nt.object_id
        AND rg.pdw_node_id = nt.pdw_node_id
        AND rg.distribution_id = nt.distribution_id
    INNER JOIN sys.indexes ind on tb.object_id = ind.object_id
    WHERE rg.index_id < 2 -- In case this condition removed the number of rows will gets duplicated based on the number of index.
    AND ind.type_desc IN ('CLUSTERED COLUMNSTORE', 'HEAP') -- Switch between the CCI (Column store) and HEAP, You should at least keep one value or else the total number of rows will gets duplicated based on the number of indexes.
    GROUP BY sm.name
        ,tb.name
        ,tb.object_id
    ) actualRowCounts ON objIdsWithStats.object_id = actualRowCounts.object_id

步驟 2:確定統計數據是最新的

更新數據可能會大幅影響用來產生有效執行計劃的統計數據直方圖。 執行下列查詢,以判斷統計數據的上次更新日期是否與數據表的修改模式一致:

SELECT ob.[object_id],max(sm.[name]) AS [schema_name]
    ,max(tb.[name]) AS [table_name]
    ,st.[stats_id]
    ,max(st.[name]) AS [stats_name]
    ,string_agg(co.[name], ',') AS [stats_column_names]
    ,STATS_DATE(ob.[object_id], st.[stats_id]) AS [stats_last_updated_date]
    ,'UPDATE STATISTICS ' + quotename(max(sm.[name])) + '.' + quotename(max(tb.[name])) + ';' as [update_stats_stmt]
FROM sys.objects ob
JOIN sys.stats st ON ob.[object_id] = st.[object_id]
JOIN sys.stats_columns sc ON st.[stats_id] = sc.[stats_id]
    AND st.[object_id] = sc.[object_id]
JOIN sys.columns co ON sc.[column_id] = co.[column_id]
    AND sc.[object_id] = co.[object_id]
JOIN sys.types ty ON co.[user_type_id] = ty.[user_type_id]
JOIN sys.tables tb ON co.[object_id] = tb.[object_id]
JOIN sys.schemas sm ON tb.[schema_id] = sm.[schema_id]
WHERE st.[stats_id] > 1
GROUP BY ob.[object_id], st.[stats_id]
ORDER BY stats_last_updated_date

步驟 3:更新已識別數據表的統計數據

在識別先前步驟中的候選數據表之後,請 (查詢數據行中 update_stats_stmt 產生的) 執行語句,以更新統計數據。

注意事項

我們不建議更新個別統計數據,即使使用者已建立也一般。 UPDATE STATISTICS藉由執行 而不指定統計數據名稱,將會更新與數據表相關聯的所有統計數據,以及控制節點數據列計數。 您可以考慮使用 WITH FULLSCANWITH SAMPLE <SamplePercent> PERCENT 來覆寫默認掃描百分比,以達到大型數據表的適當精確度。 如需完整語法,請參閱 UPDATE STATISTICS (Transact-SQL)

例如:

UPDATE STATISTICS [dbo].[MyAwesomeTable];

更新統計數據之後,請重新執行問題查詢,以判斷統計數據更新是否已改善執行持續時間。

統計數據維護的更多資源