Store Postgres changes in the lakehouse

Note

Lakebase Change Data Feed is in Public Preview.

Set up Lakebase Change Data Feed (CDF) on a Postgres table, then watch row-level changes appear in the destination Delta table.

Steps:Enable change capture → ② Start the feed → ③ Follow a row into the lakehouse → ④ Change the row, see it flow through

Note

This is a quickstart. For complete documentation, see Lakebase Change Data Feed.

Before you begin

  • Make sure you completed Get a Postgres database. You need a Lakebase project with the playing_with_lakebase sample table.
  • A Unity Catalog catalog and schema where you have CREATE TABLE permission.

Step 1: Enable change capture

Postgres needs full row data in the write-ahead log for CDF to work. Setting replica identity to full tells Postgres to record both the old and new row state for every change.

In the Lakebase SQL Editor, run:

ALTER TABLE playing_with_lakebase REPLICA IDENTITY FULL;

Learn more: Set replica identity on all tables in a schema and auto-apply it to new tables

Step 2: Start the feed

Lakebase CDF is configured at the schema level. Every current and future table in the source schema is included automatically, so you don't pick individual tables.

From your production branch, open the Change Data Feed tab and click Start. Choose public as the source schema, then pick a destination Unity Catalog catalog and schema. The initial snapshot begins immediately, and lb_playing_with_lakebase_history appears as a Delta table in your destination.

Start dialog with source and destination selection.

Learn more: Start the change data feed

Step 3: Follow a row into the lakehouse

Pick a row from Lakebase. Take a look at row id=2:

SELECT * FROM playing_with_lakebase WHERE id = 2;

Now find the same row in the Delta history table. Switch to a Databricks SQL warehouse or notebook and run:

SELECT * FROM <catalog>.<schema>.lb_playing_with_lakebase_history
WHERE id = 2;

Replace <catalog> and <schema> with the destination you chose in Step 2. You'll see row id=2 with the same name and value as in Lakebase, plus extra columns. The initial snapshot wrote every existing row into Delta as an insert event, which is what that row represents.

Those extra columns describe what kind of event each row represents (_pg_change_type), when it happened (_timestamp), and the Postgres ordering info (_pg_lsn, _pg_xid).

Learn more: Destination table schema | Data type mapping

Step 4: Change the row, see it flow through

Back in the Lakebase SQL Editor, update row id=2:

UPDATE playing_with_lakebase SET value = 55.5 WHERE id = 2;

Wait a few seconds for the change to appear in the feed, then re-query the history table:

SELECT id, value, _pg_change_type, _timestamp
FROM <catalog>.<schema>.lb_playing_with_lakebase_history
WHERE id = 2
ORDER BY _pg_lsn DESC;

Delta history table showing three rows for id=2: update_preimage, update_postimage, and insert

Row id=2 now appears three times: the original insert, an update_preimage with the old value, and an update_postimage with the new value. Every change to the row becomes a new history row, so you always have a complete audit trail. Deletes work the same way, appending one row with _pg_change_type = 'delete'.

Learn more: Common change patterns | Build downstream pipelines

Next steps