Row tracking in Azure Databricks

Available in Databricks Runtime 14.1 and above, row tracking assigns stable row IDs and row commit versions to each row, enabling row-level lineage tracking. Some incremental updates for materialized views require this feature.

All Apache Iceberg v3 tables include row tracking. See Use Apache Iceberg v3 features. For Delta Lake tables, you must explicitly enable row tracking.

Note

Enabling row tracking upgrades the table writer protocol and might affect compatibility with external Delta Lake clients. See Delta Lake feature compatibility and protocols.

Enable row tracking on Delta Lake tables

To enable row tracking on a Delta Lake table, set the table property delta.enableRowTracking = true during table creation:

CREATE TABLE table_name
TBLPROPERTIES (delta.enableRowTracking = true)
AS SELECT * FROM source_table;

To enable row tracking on an existing Delta Lake table, use the following example:

ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = true);

Important

Enabling row tracking on existing tables automatically assigns row IDs and row commit versions to all existing rows in the table. This process might result in the creation of multiple new versions of the table and take a significant amount of time to complete.

Warning

If your table is a target for continuous writes, such as Structured Streaming workloads, pause write operations before enabling row tracking and resume after the operation completes. Using ALTER TABLE to enable row tracking updates table metadata, causing any concurrent write operations to fail with a MetadataChangedException for the duration of the operation. See Conflict exceptions.

Cloning a table creates a separate history, so the row IDs and row commit versions on cloned tables don't match those in the original table.

Metadata fields

Row tracking adds two hidden metadata fields to the table. You can explicitly add these fields to your query to return the values.

Column name Type Values Explanation
_metadata.row_id Long The unique identifier of the row. A row keeps the same ID whenever it is modified using a MERGE or UPDATE statement.
_metadata.row_commit_version Long The Delta log or table version at which the row was last inserted or updated. A row is assigned a new version whenever it is modified using a MERGE or UPDATE statement.

Some operations store these metadata fields using the transaction log. Running OPTIMIZE or REORG operations on a table with row tracking enabled rewrites data files to store these fields.

Turn off row tracking on Delta Lake tables

To turn off row tracking on a Delta Lake table, set the table property to false.

ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = false);

Important

Disabling row tracking doesn't remove the corresponding table feature and doesn't downgrade the table protocol version. It also doesn't remove the metadata fields from the target table. To fully remove the table feature and downgrade the protocol, use DROP FEATURE. See Drop a Delta Lake table feature and downgrade table protocol.

After turning off row tracking, the generated row IDs are no longer reliable for tracking unique rows.

Limitations

The row IDs and row commit versions metadata fields can't be accessed while reading the change data feed. See Use change data feed on Azure Databricks.