Choose between SQL and Python

Lakeflow Spark Declarative Pipelines (SDP) supports both SQL and Python interfaces for defining batch and streaming pipelines. Both interfaces produce the same underlying dataflow graph, so they provide equivalent functionality for most data processing. They differ in flexibility, accessibility, and feature coverage.

Use this guidance to decide which interface to use:

  • If you can express your logic in SQL, use SQL.
  • If you need programmatic control or a Python-only feature, use Python.
  • If you're more comfortable with Python, use Python. It covers the full pipeline feature set, so familiarity is reason enough. The same isn't true in reverse: SQL doesn't cover every feature, so don't choose it on familiarity alone.

You can also combine both interfaces in the same pipeline. See Mix SQL and Python.

When to use SQL

SQL is a good fit when you want:

  • Readable, declarative definitions: Clear logic that data engineers and analysts can maintain.
  • Standard table types: Pipelines built mostly from streaming tables and materialized views.
  • Linear transformation chains: Straightforward ingestion and transformation, such as a bronze-to-silver-to-gold flow, without procedural logic.
  • Standalone tables: Standalone streaming tables or materialized views, which you author in SQL.

For an overview of developing pipelines in SQL, see Develop Lakeflow Spark Declarative Pipelines code with SQL.

When to use Python

Python is a good fit when you need:

  • Programmatic control: Loops, conditionals, and metaprogramming to generate pipeline definitions dynamically.
  • External libraries: Python packages such as faker or boto3. See Manage Python dependencies for pipelines.
  • User-defined functions (UDFs): You define UDFs in Python and can call them from both Python and SQL source files. See User-defined scalar functions - Python.
  • Python-only features:
    • create_auto_cdc_from_snapshot_flow() to apply change data capture from a database snapshot.
    • create_sink() and foreach_batch_sink() to write to external event streaming or Delta destinations.

For an overview of developing pipelines in Python, see Develop pipeline code with Python.

Mix SQL and Python

A single pipeline can combine SQL and Python definitions, but each language must be in a separate source file. For example, you can define your bronze and silver tables in Python and your gold tables in SQL.

Feature availability

The following table compares how each interface supports common pipeline features:

Feature SQL Python
streaming table CREATE STREAMING TABLE create_streaming_table(), table()
materialized view CREATE MATERIALIZED VIEW materialized_view()
Temporary view CREATE TEMPORARY VIEW temporary_view()
Private table CREATE PRIVATE STREAMING TABLE, CREATE PRIVATE MATERIALIZED VIEW table(private=True)
Auto CDC AUTO CDC ... INTO create_auto_cdc_flow()
Auto CDC from snapshot Not supported create_auto_cdc_from_snapshot_flow()
Flow CREATE FLOW append_flow()
Sink Not supported create_sink(), foreach_batch_sink()
Expectations CONSTRAINT ... EXPECT expect(), expect_or_drop(), expect_or_fail(), and the expect_all variants

Decision summary

If you need...

Goal Recommended interface
Simplicity and readability SQL
Quick declarative setup SQL
Standalone streaming table or materialized view SQL
Conditional or looping logic Python
UDFs or external Python libraries Python
Auto CDC from snapshot or sinks Python
Full programmatic control and modularity Python