Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page helps you understand the SQL Server ingestion workflow, including the factors that determine your setup approach and the steps involved for different user personas.
Standard CDC vs. integrated CDC
SQL Server supports two ingestion architectures. The following table compares them:
| Feature | Standard CDC (gateway-based) | Integrated CDC (Beta) |
|---|---|---|
| Number of pipelines | Two (ingestion gateway and ingestion pipeline) | One (unified pipeline) |
| Setup | Create a gateway, then create an ingestion pipeline that references the gateway ID | Create a single pipeline that references a Unity Catalog connection |
| Gateway mode | The gateway runs continuously | The pipeline embeds extraction in each update |
| Connection reference | ingestion_gateway_id |
connection_name (a Unity Catalog connection) |
| Connector type | Implicit | Explicit: connector_type: CDC |
| Staging volume | The gateway manages the staging volume internally | You configure the staging volume through data_staging_options. The pipeline autocreates one if not specified. |
The same source database configuration applies to both architectures. See Configure Microsoft SQL Server for ingestion into Azure Databricks. For more information, see Create an integrated CDC pipeline for SQL Server.
Feature availability
| Feature | Availability |
|---|---|
| UI-based pipeline authoring | |
| API-based pipeline authoring | |
| Declarative Automation Bundles | |
| Incremental ingestion | |
| Unity Catalog governance | |
| Orchestration using Lakeflow Jobs | |
| SCD type 2 | |
| API-based column selection and deselection | |
| API-based row filtering | |
| Automated schema evolution: New and deleted columns | |
| Automated schema evolution: Data type changes | |
| Automated schema evolution: Column renames | Requires a full refresh. |
| Automated schema evolution: New tables | If you ingest the entire schema. See the limitations on the number of tables per pipeline. |
| Maximum number of tables per pipeline | 250 |
Authentication methods
| Authentication method | Availability |
|---|---|
| OAuth U2M | |
| OAuth M2M | |
| OAuth (manual refresh token) | |
| Basic authentication (username/password) | |
| Basic authentication (API key) | |
| Basic authentication (service account JSON key) |
What to know before you start
| Topic | Why it matters |
|---|---|
| Azure Databricks user persona | The workflow depends on your Azure Databricks user persona:
|
| Database variation | The source database configuration depends on the SQL Server deployment environment. |
| Change tracking method | The source database configuration depends on how you choose to track changes in the source. |
| Authentication method | The steps to create a connection depend on the authentication method you choose. |
| Interface | The steps to create a connection, a gateway, and a pipeline depend on the interface. |
| Ingestion frequency | The pipeline schedule depends on your latency and cost requirements. |
| Common patterns | Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability. |
Start ingesting from SQL Server
The following table provides an overview of the end-to-end SQL Server ingestion workflow, based on user type:
| User | Steps |
|---|---|
| Admin |
|
| Non-admin | Use any supported interface to create a gateway and a pipeline. See Ingest data from SQL Server. |