Pipeline Limitations

The following are limitations of Lakeflow Spark Declarative Pipelines that are important to know as you develop your pipelines:

An Azure Databricks workspace is limited to 200 concurrent pipeline updates. The number of datasets that a single pipeline can contain is determined by the pipeline configuration and workload complexity.
The configuration of a pipeline includes references to source files and folders.
- If the configuration references only individual notebooks or files, the limit per pipeline is 100 source files.
- If the configuration includes folders, you can include up to 50 source entries made up of files or folders.
  
  Referencing a folder indirectly references the files within that folder. In this case, the limit on the number of files referenced (directly or indirectly) is 1000.
If you need more than 100 source files, organize them into folders. To learn how to use folders to contain source files, see Pipeline assets browser in the Lakeflow pipeline editor.
Pipeline datasets can be defined only once. Because of this, they can be the target of only a single operation across all pipelines. The exception is streaming tables with append flow processing, which allows you to write to the streaming table from multiple streaming sources. See Using multiple flows to write to a single target.
Identity columns have the following limitations. To learn more about identity columns in Delta tables, see Use identity columns in Delta Lake.
- Identity columns are not supported with tables that are the target of AUTO CDC processing.
- Identity columns might be recomputed during updates to a materialized views. Because of this, Databricks recommends using identity columns in pipelines only with streaming tables.
Materialized views and streaming tables published from pipelines, including those created by Databricks SQL, can be accessed only by Azure Databricks clients and applications. However, to make your materialized views and streaming tables accessible externally, you can use the sink API to write to tables in an external Delta instance. See Sinks in Lakeflow Spark Declarative Pipelines.
There are limitations for the Databricks compute required to run and query Unity Catalog pipelines. See the Requirements for pipelines that publish to Unity Catalog.
Delta Lake time travel queries are supported only with Streaming tables, and are not supported with materialized views. See Work with table history.
You cannot enable Iceberg reads on materialized views and streaming tables.
The pivot() function is not supported. The pivot operation in Spark requires the eager loading of input data to compute the output schema. This capability is not supported in pipelines.

For Lakeflow Spark Declarative Pipelines resource quotas, see Resource limits.

Feedback

Var denne side nyttig?

Last updated on 2026-02-13

Del via

Pipeline Limitations

Feedback

Yderligere ressourcer