Query-based connector limitations

General limitations

Query-based connectors query the source on a schedule. They don't provide continuous ingestion. If you need lower latency, use a managed CDC database connector.
The cursor column must be a single column. Composite cursors (multiple columns combined) aren't supported. The column value must be monotonically increasing. Rows with cursor values at or below the stored high-water mark aren't reingested.
Rows with a NULL cursor column are not ingested.

Source table requirements and recommendations

Consider the following source table characteristics when you design a query-based ingestion workload. They affect how a table is ingested and how well the workload performs.

Primary key, unique key, and cursor

Query-based connectors use a primary key (PK) or unique key (UK) to identify rows and a cursor column to detect changes. The ingestion behavior depends on which of these are available on the source table:

Source table	Ingestion behavior
PK or UK with a cursor	Regular snapshot and incremental ingestion.
PK or UK without a cursor	Batch snapshot mode (`APPLY CHANGES FROM SNAPSHOT`). The connector re-reads the full table on each run, so this mode is best suited to smaller tables.
No PK or UK, cursor available	The connector generates a synthetic key from all columns. This works for tables with no duplicate rows. For tables with fully duplicate rows, use `APPEND_ONLY` storage mode.
No PK or UK, no cursor	The connector generates a synthetic key from all columns. Use `APPEND_ONLY` storage mode with any column as the cursor, and always full refresh. For tables with fully duplicate rows, `APPEND_ONLY` is required because the synthetic key can't distinguish identical rows.

Partitioning and indexing recommendations

Partitioning column: The connector automatically selects a column to partition the source query for parallel reads. It first tries the leading primary key column, provided it's a numeric, string, date, or timestamp type. If that column isn't suitable, it selects a unique key column instead. For balanced partitioning, the column should have:
- High cardinality: At least 1,000 distinct values is recommended for tables approaching 1 TB.
- Low data skew: Avoid columns whose values cluster heavily around a few values. Skewed columns create oversized partitions that can cause query timeouts and lose progress when a run restarts.
If the automatically selected column doesn't meet these criteria, contact your Databricks account team to request a partitioning column override.
Indexed cursor column: Index the cursor column on the source database. An unindexed cursor column degrades incremental query performance.

API-only features

The following features are supported for query-based connectors, but only using the API:

Row filtering
Soft-deletion tracking (deletion_condition)
Hard-deletion tracking (Beta)
Append-only ingestion (APPEND_ONLY SCD mode)
Multi-destination catalog and schema

Feedback

Was this page helpful?

Last updated on 2026-06-02