Share via


Example queries for monitoring SQL warehouse activity

Use these example SQL queries with system tables to monitor SQL warehouse performance, usage, and costs. Modify the queries to fit your organization's needs. Add alerts to get notified of unexpected values.

Requirements

Tables for SQL warehouse monitoring

System table Description
system.compute.warehouse_events Tracks warehouse start, stop, scale-up, and scale-down events.
system.compute.warehouses Contains snapshots of warehouse configurations.
system.query.history Records details about every query executed on SQL warehouses.
system.billing.usage Contains billing records for all Azure Databricks usage.

Example: Warehouse usage

Use the following queries to understand how your warehouse is being used, including which queries, users, and applications drive the most activity.

Find the slowest queries on a warehouse

SELECT
  statement_id,
  executed_by,
  statement_type,
  execution_status,
  total_duration_ms,
  execution_duration_ms,
  compilation_duration_ms,
  waiting_at_capacity_duration_ms,
  read_rows,
  produced_rows,
  start_time,
  statement_text
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 1 DAY
ORDER BY
  total_duration_ms DESC
LIMIT 50
SELECT
  DATE(start_time) AS query_date,
  COUNT(*) AS total_queries,
  COUNT(CASE WHEN execution_status = 'FINISHED' THEN 1 END) AS successful_queries,
  COUNT(CASE WHEN execution_status = 'FAILED' THEN 1 END) AS failed_queries,
  ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms,
  ROUND(PERCENTILE(total_duration_ms, 0.5), 0) AS p50_duration_ms,
  ROUND(PERCENTILE(total_duration_ms, 0.95), 0) AS p95_duration_ms,
  ROUND(AVG(waiting_at_capacity_duration_ms), 0) AS avg_queue_wait_ms
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 30 DAY
GROUP BY
  DATE(start_time)
ORDER BY
  query_date DESC

Find the most active users on a warehouse

SELECT
  executed_by,
  COUNT(*) AS query_count,
  ROUND(SUM(total_duration_ms) / 1000 / 60, 2) AS total_duration_minutes,
  ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
  executed_by
ORDER BY
  query_count DESC

Find the top client applications

SELECT
  client_application,
  CASE
    WHEN query_source.job_info.job_id IS NOT NULL THEN 'Job'
    WHEN query_source.dashboard_id IS NOT NULL THEN 'Dashboard'
    WHEN query_source.legacy_dashboard_id IS NOT NULL THEN 'Legacy Dashboard'
    WHEN query_source.alert_id IS NOT NULL THEN 'Alert'
    WHEN query_source.notebook_id IS NOT NULL THEN 'Notebook'
    WHEN query_source.genie_space_id IS NOT NULL THEN 'Genie Space'
    WHEN query_source.sql_query_id IS NOT NULL THEN 'SQL Editor'
    ELSE 'Other'
  END AS source_type,
  COUNT(*) AS query_count,
  ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
  client_application,
  source_type
ORDER BY
  query_count DESC

Monitor failed queries

SELECT
  DATE(start_time) AS failure_date,
  execution_status,
  error_message,
  COUNT(*) AS failure_count,
  COLLECT_SET(executed_by) AS affected_users
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND execution_status IN ('FAILED', 'CANCELED')
  AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
  DATE(start_time),
  execution_status,
  error_message
ORDER BY
  failure_date DESC,
  failure_count DESC

Example: Warehouse sizing

Use the following queries to determine whether your warehouse is sized correctly. Queries waiting at capacity suggest that you need to increase max_clusters. Queries with excessive disk spill suggest that you need to increase the warehouse size.

Identify queries waiting at capacity

Queries with high waiting_at_capacity_duration_ms values are spending time queued instead of running. Consider increasing the warehouse max_clusters setting to allow the warehouse to scale.

SELECT
  statement_id,
  executed_by,
  total_duration_ms,
  waiting_at_capacity_duration_ms,
  execution_duration_ms,
  start_time,
  statement_text
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 7 DAY
  AND waiting_at_capacity_duration_ms > 0
ORDER BY
  waiting_at_capacity_duration_ms DESC
LIMIT 50

Identify queries with excessive disk spill

Disk spill occurs when a query requires more memory than is available. Consider increasing the warehouse size to give queries more memory. Excessive spill typically means queries need optimization or that the warehouse size is too small for the workload.

SELECT
  statement_id,
  executed_by,
  spilled_local_bytes / (1024 * 1024) AS spilled_mb,
  read_bytes / (1024 * 1024) AS read_mb,
  total_duration_ms,
  start_time,
  statement_text
FROM
  system.query.history
WHERE
  compute.warehouse_id = '<warehouse-id>'
  AND start_time >= NOW() - INTERVAL 7 DAY
  AND spilled_local_bytes > 0
ORDER BY
  spilled_local_bytes DESC
LIMIT 50

Example: Warehouse costs

Use the following queries to understand and track the costs associated with your SQL warehouses.

Monitor warehouse cost by day

SELECT
  usage_date,
  sku_name,
  ROUND(SUM(usage_quantity), 2) AS total_dbus,
  ROUND(SUM(usage_quantity * list_prices.pricing.default), 2) AS estimated_list_cost
FROM
  system.billing.usage
  LEFT JOIN system.billing.list_prices ON usage.sku_name = list_prices.sku_name
    AND price_end_time IS NULL
WHERE
  usage_metadata.warehouse_id = '<warehouse-id>'
  AND usage_date >= NOW() - INTERVAL 30 DAY
GROUP BY
  usage_date,
  sku_name
ORDER BY
  usage_date DESC

Correlate warehouse events with query volume

This query helps you understand the relationship between warehouse scaling events and query activity to identify cost optimization opportunities.

WITH hourly_events AS (
  SELECT
    DATE_TRUNC('hour', event_time) AS event_hour,
    warehouse_id,
    MAX(cluster_count) AS max_clusters,
    COLLECT_SET(event_type) AS event_types
  FROM
    system.compute.warehouse_events
  WHERE
    warehouse_id = '<warehouse-id>'
    AND event_time >= NOW() - INTERVAL 7 DAY
  GROUP BY
    DATE_TRUNC('hour', event_time),
    warehouse_id
),
hourly_queries AS (
  SELECT
    DATE_TRUNC('hour', start_time) AS query_hour,
    COUNT(*) AS query_count,
    ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms,
    ROUND(AVG(waiting_at_capacity_duration_ms), 0) AS avg_queue_wait_ms
  FROM
    system.query.history
  WHERE
    compute.warehouse_id = '<warehouse-id>'
    AND start_time >= NOW() - INTERVAL 7 DAY
  GROUP BY
    DATE_TRUNC('hour', start_time)
)
SELECT
  COALESCE(e.event_hour, q.query_hour) AS hour,
  q.query_count,
  q.avg_duration_ms,
  q.avg_queue_wait_ms,
  e.max_clusters,
  e.event_types
FROM
  hourly_events e
  FULL OUTER JOIN hourly_queries q ON e.event_hour = q.query_hour
ORDER BY
  hour DESC