Choose between SQL and Python

Lakeflow Spark Declarative Pipelines (SDP) supports both SQL and Python interfaces for defining batch and streaming pipelines. Both interfaces produce the same underlying dataflow graph, so they provide equivalent functionality for most data processing. They differ in flexibility, accessibility, and feature coverage.

Use this guidance to decide which interface to use:

If you can express your logic in SQL, use SQL.
If you need programmatic control or a Python-only feature, use Python.
If you're more comfortable with Python, use Python. It covers the full pipeline feature set, so familiarity is reason enough. The same isn't true in reverse: SQL doesn't cover every feature, so don't choose it on familiarity alone.

You can also combine both interfaces in the same pipeline. See Mix SQL and Python.

When to use SQL

SQL is a good fit when you want:

Readable, declarative definitions: Clear logic that data engineers and analysts can maintain.
Standard table types: Pipelines built mostly from streaming tables and materialized views.
Linear transformation chains: Straightforward ingestion and transformation, such as a bronze-to-silver-to-gold flow, without procedural logic.
Standalone tables: Standalone streaming tables or materialized views, which you author in SQL.

For an overview of developing pipelines in SQL, see Develop Lakeflow Spark Declarative Pipelines code with SQL.

When to use Python

Python is a good fit when you need:

Programmatic control: Loops, conditionals, and metaprogramming to generate pipeline definitions dynamically.
External libraries: Python packages such as faker or boto3. See Manage Python dependencies for pipelines.
User-defined functions (UDFs): You define UDFs in Python and can call them from both Python and SQL source files. See User-defined scalar functions - Python.
Python-only features:
- create_auto_cdc_from_snapshot_flow() to apply change data capture from a database snapshot.
- create_sink() and foreach_batch_sink() to write to external event streaming or Delta destinations.

For an overview of developing pipelines in Python, see Develop pipeline code with Python.

Mix SQL and Python

A single pipeline can combine SQL and Python definitions, but each language must be in a separate source file. For example, you can define your bronze and silver tables in Python and your gold tables in SQL.

Feature availability

The following table compares how each interface supports common pipeline features:

Feature	SQL	Python
streaming table	`CREATE STREAMING TABLE`	`create_streaming_table()`, `table()`
materialized view	`CREATE MATERIALIZED VIEW`	`materialized_view()`
Temporary view	`CREATE TEMPORARY VIEW`	`temporary_view()`
Private table	`CREATE PRIVATE STREAMING TABLE`, `CREATE PRIVATE MATERIALIZED VIEW`	`table(private=True)`
Auto CDC	`AUTO CDC ... INTO`	`create_auto_cdc_flow()`
Auto CDC from snapshot	Not supported	`create_auto_cdc_from_snapshot_flow()`
Flow	`CREATE FLOW`	`append_flow()`
Sink	Not supported	`create_sink()`, `foreach_batch_sink()`
Expectations	`CONSTRAINT ... EXPECT`	`expect()`, `expect_or_drop()`, `expect_or_fail()`, and the `expect_all` variants

Decision summary

If you need...

Goal	Recommended interface
Simplicity and readability	SQL
Quick declarative setup	SQL
Standalone streaming table or materialized view	SQL
Conditional or looping logic	Python
UDFs or external Python libraries	Python
Auto CDC from snapshot or sinks	Python
Full programmatic control and modularity	Python

Feedback

Was this page helpful?

Last updated on 2026-06-22