Compute system tables reference

This article provides you with a reference guide for the compute system tables. You can use these tables to monitor the activity and metrics of classic all-purpose compute, jobs compute, and Lakeflow Spark Declarative Pipelines compute in your account. These classic compute tables include:

clusters: Records compute configurations in your account.
node_types: Includes a single record for each of the currently available node types, including hardware information.
node_timeline: Includes minute-by-minute records of your compute's utilization metrics.
instance_events: Captures state transitions of classic compute instances.
instance_pools: Records instance pool configurations in your account.

Cluster table schema

The cluster table is a slow-changing dimension table that contains the full history of compute configurations over time for all-purpose compute, jobs compute, Lakeflow Spark Declarative Pipelines compute, and pipeline maintenance compute.

Table path: This system table is located at system.compute.clusters

Column name	Data type	Description	Example
`account_id`	string	ID of the account where this cluster was created.	`23e22ba4-87b9-` `4cc2-9770-d10b894b7118`
`workspace_id`	string	ID of the workspace where this cluster was created.	`1234567890123456`
`cluster_id`	string	ID of the cluster for which this record is associated.	`0000-123456-crmpt124`
`cluster_name`	string	User defined name for the cluster.	`My cluster`
`owned_by`	string	Username of the cluster owner. Defaults to the cluster creator, but can be changed through the Clusters API.	`sample_user@email.com`
`create_time`	timestamp	Timestamp of the change to this compute definition.	`2023-01-09 11:00:00.000`
`delete_time`	timestamp	Timestamp of when the cluster was deleted. The value is `null` if the cluster is not deleted.	`2023-01-09 11:00:00.000`
`driver_node_type`	string	Driver node type name. This matches the instance type name from the cloud provider.	`Standard_D16s_v3`
`worker_node_type`	string	Worker node type name. This matches the instance type name from the cloud provider.	`Standard_D16s_v3`
`worker_count`	bigint	Number of workers. Defined for fixed-size clusters only.	`4`
`min_autoscale_workers`	bigint	The set minimum number of workers. This field is valid only for autoscaling clusters.	`1`
`max_autoscale_workers`	bigint	The set maximum number of workers. This field is valid only for autoscaling clusters.	`1`
`auto_termination_minutes`	bigint	The configured autotermination duration.	`120`
`enable_elastic_disk`	boolean	Autoscaling disk enablement status.	`true`
`tags`	map	User-defined tags for the cluster (does not include default tags).	`{"ResourceClass":"SingleNode"}`
`cluster_source`	string	Source of the cluster. The `UI` or `API` values only apply to all-purpose compute. All job compute is logged as `JOB`. Pipelines are `PIPELINE` or `PIPELINE_MAINTENANCE`.	`UI`
`init_scripts`	array	Set of paths for init scripts.	`"/Users/example@email.com` `/files/scripts/install-python-pacakges.sh"`
`aws_attributes`	struct	AWS specific settings.	`null`
`azure_attributes`	struct	Azure specific settings.	`{` `"first_on_demand": "0",` `"availability": "ON_DEMAND_AZURE",` `"spot_bid_max_price": "—1"` `}`
`gcp_attributes`	struct	GCP specific settings. This field will be empty.	`null`
`driver_instance_pool_id`	string	Instance pool ID if the driver is configured on top of an instance pool.	`1107-555555-crhod16-pool-DIdnjazB`
`worker_instance_pool_id`	string	Instance Pool ID if the worker is configured on top of an instance pool.	`1107-555555-crhod16-pool-DIdnjazB`
`dbr_version`	string	The Databricks Runtime of the cluster.	`14.x-snapshot-scala2.12`
`change_time`	timestamp	Timestamp of change to the compute definition.	`2023-01-09 11:00:00.000`
`change_date`	date	Change date. Used for retention.	`2023-01-09`
`data_security_mode`	string	The access mode of the compute resource. See Access mode reference.	`USER_ISOLATION`
`policy_id`	string	ID of the cluster's compute policy, if applicable.	`1234F35636110A5B`

Access mode reference

The following table translates the possible values contained in the data_security_mode column. The column can also be null for certain pipelines and system-created clusters.

Value	Access mode
`USER_ISOLATION`	Standard
`SINGLE_USER`	Dedicated

The legacy access modes are recorded with the following values:

Value	Access mode
`LEGACY_PASSTHROUGH`	Credential passthrough (shared)
`LEGACY_SINGLE_USER`	Credential passthrough (single user)
`LEGACY_TABLE_ACL`	Custom
`NONE`	No isolation shared

Node types table schema

The node type table captures the currently available node types with their basic hardware information.

Table path: This system table is located at system.compute.node_types.

Column name	Data type	Description	Example
`account_id`	string	ID of the account where this cluster was created.	`23e22ba4-87b9-4cc2-9770-d10b894b7118`
`node_type`	string	Unique identifier for node type.	`Standard_D16s_v3`
`core_count`	double	Number of vCPUs for the instance.	`48.0`
`memory_mb`	long	Total memory for the instance.	`393216`
`gpu_count`	long	Number of GPUs for the instance.	`0`

Node timeline table schema

The node timeline table captures node-level resource utilization data at minute granularity. Each record contains data for a given minute of time per instance. This table captures node timelines for the all-purpose compute, jobs compute, Lakeflow Spark Declarative Pipelines compute, and pipeline maintenance compute resources in your account.

Table path: This system table is located at system.compute.node_timeline.

Column name	Data type	Description	Example
`account_id`	string	ID of the account where this compute resource is running.	`23e22ba4-87b9-4cc2-9770-d10b894b7118`
`workspace_id`	string	ID of the workspace where this compute resource is running.	`1234567890123456`
`cluster_id`	string	ID of the compute resource.	`0000-123456-crmpt124`
`instance_id`	string	ID for the specific instance.	`i-1234a6c12a2681234`
`start_time`	timestamp	Start time for the record in UTC.	`2024-07-16T12:00:00Z`
`end_time`	timestamp	End time for the record in UTC.	`2024-07-16T13:00:00Z`
`driver`	boolean	Whether the instance is a driver or worker node.	`true`
`cpu_user_percent`	double	Percentage of time the CPU spent in userland.	`34.76163817234407`
`cpu_system_percent`	double	Percentage of time the CPU spent in the kernel.	`1.0895310279488264`
`cpu_wait_percent`	double	Percentage of time the CPU spent waiting for I/O.	`0.03445157400629276`
`mem_used_percent`	double	Percentage of the compute's memory that was used during the time period (including memory used by background processes running on the compute).	`45.34858216779041`
`mem_swap_percent`	double	Percentage of memory usage attributed to memory swap.	`0.014648443087939`
`network_sent_bytes`	bigint	The number of bytes sent out in network traffic.	`517376`
`network_received_bytes`	bigint	The number of received bytes from network traffic.	`179234`
`disk_free_bytes_per_mount_point`	map	The disk utilization grouped by mount point. This is ephemeral storage provisioned only while the compute is running.	`{"/var/lib/lxc":123455551234,"/":` `123456789123,"/local_disk0":123412341234}`
`node_type`	string	The name of the node type. This will match the instance type name from the cloud provider.	`Standard_D16s_v3`
`private_ip`	string	The private IP address assigned to the node.	`10.0.0.42`

Instance events table schema

Important

This system table is in Public Preview.

The instance events table captures state transitions of classic compute instances. Each row represents a state change for a single instance. This table includes records for all-purpose compute, jobs compute, and Lakeflow Spark Declarative Pipelines compute from all workspaces in your account deployed within the same cloud region.

Table path: This system table is located at system.compute.instance_events.

Column name	Data type	Description	Example
`account_id`	string	ID of the account where this instance is launched.	`23e22ba4-87b9-` `4cc2-9770-d10b894b7118`
`workspace_id`	string	ID of the workspace where this instance is launched.	`1234567890123456`
`instance_id`	string	ID of the instance.	`i-0a1b2c3d4e5f67890`
`event_time`	timestamp	Timestamp of the event.	`2024-01-15 10:30:00.000`
`event_type`	string	Event type. Possible values are `INSTANCE_LAUNCHING` and `STATE_TRANSITION`.	`STATE_TRANSITION`
`instance_pool_id`	string	Instance pool ID if the instance belongs to a pool.	`1107-555555-pool-abcd1234`
`cluster_id`	string	ID of the cluster this instance is placed on. Only populated when `state` is `INSTANCE_PLACED`. See cluster_id details.	`0000-123456-xxxxxxxx`
`node_type`	string	The name of the node type. This matches the instance type name from the cloud provider.	`Standard_D16s_v3`
`state`	string	Instance state. See Instance states.	`INSTANCE_PLACED`
`availability_type`	string	Availability type of the instance. Possible values are `ON_DEMAND` and `SPOT` (AWS, Azure) or `ON_DEMAND` and `PREEMPTIBLE` (GCP).	`ON_DEMAND`

Instance states

INSTANCE_LAUNCHING: The instance is being initialized.
INSTANCE_READY: The instance is fully initialized and ready to be used, but not currently in use.
INSTANCE_PLACED: The instance is currently in use (joined a cluster).
INSTANCE_TERMINATED: The instance is terminated.

When is cluster_id populated?

The cluster_id field is only populated when the instance is in the INSTANCE_PLACED state. For all other states (INSTANCE_LAUNCHING, INSTANCE_READY, INSTANCE_TERMINATED), cluster_id is null. This behavior is consistent for both pooled and non-pooled instances.

The instance_events table only includes placement events for all-purpose, jobs, and Lakeflow Spark Declarative Pipelines compute. Placement events for other workloads, such as SQL warehouses, are not included in this table.

Instance pools table schema

Important

This system table is in Public Preview.

The instance pools table is a slow-changing dimension table that contains the full history of instance pool configurations over time. When a configuration changes, a new row is emitted, logically replacing the previous one.

Table path: This system table is located at system.compute.instance_pools.

Column name	Data type	Description	Example
`account_id`	string	ID of the account where this instance pool was created.	`23e22ba4-87b9-` `4cc2-9770-d10b894b7118`
`workspace_id`	string	ID of the workspace where this instance pool was created.	`1234567890123456`
`instance_pool_id`	string	ID of the instance pool.	`1107-555555-pool-abcd1234`
`change_time`	timestamp	Timestamp of change to instance pool configuration.	`2024-01-15 10:30:00.000`
`create_time`	timestamp	Timestamp of instance pool creation.	`2024-01-10 08:00:00.000`
`delete_time`	timestamp	Timestamp of instance pool deletion. The value is `null` if the instance pool is not deleted.	`null`
`instance_pool_name`	string	User defined name of the instance pool.	`My instance pool`
`tags`	map	User-defined tags for the instance pool (does not include default tags).	`{"team":"data-engineering"}`
`node_type`	string	Node type used for instances in the pool. This matches the instance type name from the cloud provider.	`Standard_D16s_v3`
`idle_instance_autotermination_minutes`	bigint	The number of minutes that idle instances in the pool cache are automatically terminated after they are inactive.	`120`
`min_idle_instances`	bigint	Minimum number of idle instances to keep in the instance pool.	`2`
`max_capacity`	bigint	Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances.	`10`
`enable_elastic_disk`	boolean	Autoscaling Local Storage: when enabled, instances in this pool dynamically acquire additional disk space when Spark workers are running low on disk space.	`true`
`disk_spec`	struct	The specification of the disks that are attached to all Spark containers.	`{` `"disk_type": "PREMIUM_LRS",` `"disk_count": 2,` `"disk_size": 100` `}`
`preloaded_docker_images`	array	Custom Docker images preloaded on the pool.	`[]`
`preloaded_spark_version`	string	Preloaded Spark image version for the pool, if defined.	`14.3.x-scala2.12`
`aws_attributes`	struct	Attributes related to instance pools running on AWS.	`null`
`azure_attributes`	struct	Attributes related to instance pools running on Azure.	`{` `"availability": "ON_DEMAND_AZURE",` `"spot_bid_max_price": -1` `}`
`gcp_attributes`	struct	Attributes related to instance pools running on GCP.	`null`

Known limitations

Compute resources that were marked deleted before October 23, 2023 do not appear in the clusters table. This might result in joins from the system.billing.usage table not matching records in the clusters table. All active compute resources have been backfilled.
These tables only includes records for all-purpose and jobs compute. They do not contain records for serverless compute or SQL warehouses.
Nodes that ran for less than 10 minutes might not appear in the node_timeline table.

Sample queries

You can use the following sample queries to answer common questions:

Join cluster records with the most recent billing records
Identify the compute resources with the highest average utilization and peak utilization
Get the most recent version of each instance pool
Calculate instance idle and active time

Note

Some of these examples join the cluster table with the system.billing.usage table. Since billing records are cross-regional and cluster records region-sepcific, billing records only match cluster records for the region in which you are querying. To see records from another region, please execute the query in that region.

Join cluster records with the most recent billing records

This query can help you understand spending over time. Once you update the usage_start_time to the most current billing period, it grabs the most recent updates to the billing records to join into clusters data.

Each record is associated with the cluster owner during that particular run. So, if the cluster owner changes, costs will roll up to the correct owner based on when the cluster was used.

SELECT
  u.record_id,
  c.cluster_id,
  c.owned_by,
  c.change_time,
  u.usage_start_time,
  u.usage_quantity
FROM
  system.billing.usage u
  JOIN system.compute.clusters c
  JOIN (SELECT u.record_id, c.cluster_id, max(c.change_time) change_time
    FROM system.billing.usage u
    JOIN system.compute.clusters c
    WHERE
      u.usage_metadata.cluster_id is not null
      and u.usage_start_time >= '2023-01-01'
      and u.usage_metadata.cluster_id = c.cluster_id
      and date_trunc('HOUR', c.change_time) <= date_trunc('HOUR', u.usage_start_time)
    GROUP BY all) config
WHERE
  u.usage_metadata.cluster_id is not null
  and u.usage_start_time >= '2023-01-01'
  and u.usage_metadata.cluster_id = c.cluster_id
  and u.record_id = config.record_id
  and c.cluster_id = config.cluster_id
  and c.change_time = config.change_time
ORDER BY cluster_id, usage_start_time desc;

Identify the compute resources with the highest average utilization and peak utilization

Identify the all-purpose and jobs compute that have the highest average CPU utilization and the highest peak CPU utilization.

SELECT
        distinct cluster_id,
driver,
avg(cpu_user_percent + cpu_system_percent) as `Avg CPU Utilization`,
max(cpu_user_percent + cpu_system_percent) as `Peak CPU Utilization`,
        avg(cpu_wait_percent) as `Avg CPU Wait`,
        max(cpu_wait_percent) as `Max CPU Wait`,
        avg(mem_used_percent) as `Avg Memory Utilization`,
        max(mem_used_percent) as `Max Memory Utilization`,
avg(network_received_bytes)/(1024^2) as `Avg Network MB Received per Minute`,
avg(network_sent_bytes)/(1024^2) as `Avg Network MB Sent per Minute`
FROM
        node_timeline
WHERE
        start_time >= date_add(now(), -1)
GROUP BY
        cluster_id,
        driver
ORDER BY
        3 desc;

Get the most recent version of each instance pool

The instance_pools table is SCD2 type, where instead of updating existing records, a new record is created every time a change is made. To get the most recent version, take the entry with the largest change_time.

SELECT *
FROM system.compute.instance_pools
QUALIFY row_number() OVER (
  PARTITION BY workspace_id, instance_pool_id
  ORDER BY change_time DESC
) = 1;

Calculate instance idle and active time

This query calculates the total idle time and active time for each instance using state transitions from the instance_events table.

WITH instance_states AS (
  SELECT
    *,
    event_time AS start_time,
    lead(event_time) OVER (
      PARTITION BY workspace_id, instance_id
      ORDER BY event_time
    ) AS end_time
  FROM system.compute.instance_events
  WHERE event_type IN ('INSTANCE_LAUNCHING', 'STATE_TRANSITION')
)
SELECT
  workspace_id,
  instance_id,
  instance_pool_id,
  sum(if(state = 'INSTANCE_READY',
    TIMESTAMPDIFF(SECOND, start_time, end_time), 0)) / 60 AS idle_minutes,
  sum(if(state = 'INSTANCE_PLACED',
    TIMESTAMPDIFF(SECOND, start_time, end_time), 0)) / 60 AS active_minutes
FROM instance_states
GROUP BY workspace_id, instance_id, instance_pool_id;

Feedback

Was this page helpful?

Last updated on 2026-05-21