Use parameters with pipelines

Pipeline parameters let you reuse the same pipeline source code across environments or datasets. For example, you can run the same transformations against dev and prod catalogs, or ingest from a different source path on each run. You define parameters on the pipeline (or override them when starting an update) and reference them from your SQL source code.

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

This page describes the pipeline parameters feature, available to SQL source code. To parameterize Python source code in a pipeline, continue to use the Configuration field as described in Reference parameters using the configuration field. Configuration is also used to set Spark configuration values that pipelines read at runtime. For details on the Spark configuration settings, see Pipeline properties reference.

What are pipeline parameters?

Pipeline parameters are key-value pairs that you can:

Declare as defaults in pipeline settings.
Override when starting an update from the pipeline UI, the Start update API, or the Run with different settings dialog.
Override on the pipeline task in a job, with optional pushdown of job-level parameters.
Reference from SQL source code using the named parameter syntax.

Parameter values are always strings. Keys can contain alphanumeric characters, underscores (_), hyphens (-), and periods (.).

Pipeline parameters and the Configuration field serve different purposes:

Use Parameters for...	Use Configuration for...
Values that change between updates (target catalog, source path, date range).	Spark configuration that controls pipeline behavior (`pipelines.enzyme.enabled`, `pipelines.clusterLabelsV2Enabled`).
Values you want pushed down from a job or task.	Static, structural pipeline properties.
Values you reference in SQL with named parameter syntax.	Values you reference with `${key}` syntax in SQL or `spark.conf.get("key")` in Python.

Define pipeline parameters

You can define default parameter values in pipeline settings. When an update runs without overrides, the pipeline uses these defaults.

Use the pipeline UI

In your workspace, click Jobs and Pipelines in the sidebar and select your pipeline.
Click Settings.
In the Pipeline settings sidebar, find the Parameters section and click Edit.
Add Key and Value entries, then click Save.

Use the JSON or REST API

Add a parameters map to the pipeline definition:

{
  "name": "Sales pipeline",
  "parameters": {
    "source_catalog": "dev_catalog",
    "source_schema": "sales",
    "start_date": "2026-01-01"
  }
}

For the full pipeline JSON reference, see Pipeline configurations.

Reference parameters in SQL source code

Reference a parameter by prefixing the key with a colon. Azure Databricks binds the value as a string at update time:

CREATE OR REFRESH MATERIALIZED VIEW transaction_summary AS
SELECT account_id,
  COUNT(txn_id) AS txn_count,
  SUM(txn_amount) AS account_revenue
FROM :source_catalog.sales.transactions
WHERE txn_date >= :start_date
GROUP BY account_id

To use a parameter in an identifier position, such as a catalog, schema, or table name, wrap it in IDENTIFIER():

USE CATALOG IDENTIFIER(:source_catalog);
USE SCHEMA IDENTIFIER(:source_schema);

CREATE OR REFRESH MATERIALIZED VIEW daily_sales AS
SELECT date(timestamp) AS date,
  SUM(price) AS total_sales
FROM transactions
GROUP BY date;

If your source code references a parameter that has no value at update time, the update fails with an error. The pipeline ignores extra parameters that the code doesn't reference.

Override parameters at update time

You can override parameter values for a single update without changing the saved defaults.

From the pipeline UI, click Run with different settings and edit the Parameters section.
From a pipeline task in a job, set parameter overrides in the task's Parameters field. See Parameters.
From the API, pass a parameters map in the Start update request.

Azure Databricks records the parameters for a specific update in the update history and displays them in the Run parameters column of the pipeline runs list.

Parameter precedence

When you define the same key in more than one place, the value with the highest precedence wins. From highest to lowest:

Job run parameters: values supplied for a single job run (overrides).
Job parameters: defaults defined on the parent job.
Pipeline task parameters: values set on the pipeline task.
Pipeline parameters: defaults defined in pipeline settings.

This matches the precedence used by other job parameter task types.

Pipeline parameters in Lakeflow Jobs

When you schedule a pipeline as a pipeline task in a job, the task can supply parameters that override the pipeline's defaults. Parameter values can use dynamic value references to inject job-run-time values such as {{job.trigger.time.iso_date}} or {{job.parameters.region}}.

Lakeflow Jobs also pushes all job parameters down to pipeline tasks automatically, the same way it pushes them down to notebook and SQL tasks. The pipeline source code can reference any pushed-down value with named parameter syntax. Declaring a parameter in pipeline settings is optional and only sets a default for runs without an override.

Caveats and known limitations

Pipelines run one update at a time: A pipeline can only run a single update at a time. To prevent jobs from failing when multiple updates would otherwise overlap, Azure Databricks caps concurrency to 1 in two scenarios:
- A job that contains a pipeline task and is configured with max_concurrent_runs more than one.
- A pipeline task wrapped in a for-each task, regardless of the iteration count.
The job UI shows a notification when this cap takes effect. Plan around the cap when designing parameterized pipelines that you intend to run with many parameter combinations.

Date filters can trigger full refreshes: A common parameterization use case is to filter data by date. Take care with predicates: filtering on both sides of a date range invalidates incremental processing on materialized views and triggers a full refresh on each update.

-- Triggers a full refresh on each update
CREATE OR REFRESH MATERIALIZED VIEW recent_orders AS
SELECT * FROM orders
WHERE order_date >= :start_date AND order_date < :end_date;

-- Processes incrementally
CREATE OR REFRESH MATERIALIZED VIEW recent_orders AS
SELECT * FROM orders
WHERE order_date >= :start_date;

Named parameters are SQL-only: In this Beta, named parameter syntax can only be used in SQL source code. To parameterize Python source code, continue to use the Configuration field with spark.conf.get(). See Reference parameters using the configuration field.

Reference parameters using the configuration field

The Configuration field on a pipeline accepts arbitrary key-value pairs that are exposed as Spark configuration values. This is the legacy parameterization mechanism and continues to work alongside pipeline parameters. Use it for Python source code and for keys that you want to read with spark.conf.get() rather than named parameter syntax.

The following example uses a mypipeline.start_date configuration value to limit a development pipeline to a subset of input data:

SQL

CREATE OR REFRESH MATERIALIZED VIEW customer_events
AS SELECT * FROM source_table WHERE date > '${mypipeline.start_date}';

Python

from pyspark import pipelines as dp
from pyspark.sql.functions import col

@dp.table
def customer_events():
  start_date = spark.conf.get("mypipeline.start_date")
  return spark.read.table("source_table").where(col("date") > start_date)

You set Configuration values in the Configuration section of pipeline settings or in the configuration field of the pipeline JSON. Avoid keys that conflict with reserved pipeline or Apache Spark configuration values.

Phản hồi

Trang này có hữu ích không?

Last updated on 2026-05-26