Replication system table reference

Important

This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.

This page provides a reference for the states table in the replication system schema, which tracks the replication status of Azure Databricks managed disaster recovery (DR). Each row is a status event for a failover group, emitted periodically and whenever the failover group changes. Use this table to monitor the current and historical replication status of your failover groups, including replication lag and any errors that are blocking replication of at least one asset.

Table path: system.replication.states

States table schema reference

The states table uses the following schema:

Column name Data type Description Example
event_id string Unique identifier for the status event. ca886134-876c-4671-a38b-332edf48c602
event_time timestamp Timestamp of when the event was emitted. 2024-01-05T00:00:00.000+00:00
account_id string ID of the account that the failover group belongs to. ca886134-876c-4671-a38b-332edf48c602
failover_group_name string Fully qualified name of the failover group. accounts/account1/failover-group/group1
replication_state string State of the replication when the event was emitted. Possible values are INITIALIZING, REPLICATING, CREATED, UPDATED, DELETED, FAILOVER_STARTED, FAILOVER_FINISHED, and FAILOVER_ABORTED. REPLICATING
errors array When replication_state is REPLICATING, an aggregated list of the errors that are blocking replication of at least one asset, including how many assets each error affects. Otherwise, the errors that caused the operation to fail. See Errors reference. See Errors reference
replication_lag_ms long Milliseconds since the last successful replication when the event was emitted. A null value indicates that at least one asset has never been replicated from the source to the replica. 2323
effective_primary_region string The primary region of the failover group at the time the event was emitted. us-west-2
managed_assets struct The assets managed by the failover group when the event was emitted. See Managed assets reference. See Managed assets reference

Errors reference

The errors column contains an array of the errors that are blocking replication. Each element of the array is a struct with the following fields:

Field name Data type Description
error struct Details about the error. Contains error_class (string), a class name for the error; parameters (map), key-value parameters with details about the error; and message (string), a human-readable error message.
affected_assets_counts array One entry per affected asset type. Each entry contains asset_type (string), the type of the affected asset, and failing_count (long), the number of assets of that type that the error affects.

Managed assets reference

The managed_assets column is a struct that describes the assets managed by the failover group when the event was emitted. It contains the following fields:

Field name Data type Description
metastores array The metastores managed by the failover group.
workspace_sets array The workspace sets managed by the failover group. Each entry contains name (string) and workspace_ids (array).
catalogs array The catalogs managed by the failover group. Each entry contains name (string).

Considerations when reading the table

Take note of the following when you analyze the states table:

  • Data can take up to 3 hours to populate after an event occurs.
  • The table contains events for all failover groups in the Azure Databricks account.

Example queries

The following query returns the most recent replication status for a given failover group, including its current replication_state, replication lag, and any blocking errors:

SELECT
  event_time,
  replication_state,
  replication_lag_ms,
  errors
FROM system.replication.states
WHERE failover_group_name = :failover_group_name
ORDER BY event_time DESC
LIMIT 1

The following query returns the distribution of replication lag for a given failover group:

SELECT histogram_numeric(replication_lag_ms, 20) AS replication_lag_distribution
FROM system.replication.states
WHERE failover_group_name = :failover_group_name