Selectively overwrite data with Delta Lake

Delta Lake has the following distinct options for selective overwrites:

Option	Use case	Supported compute types	Minimum version
`REPLACE WHERE`	Atomically overwrite rows that match a predicate. Use for replacements with a fixed matching condition, such as `colA = 5` or `int_col IN (1, 2, 3)`.	All compute types.	SQL in Databricks Runtime 12.2 LTS and above. Python and Scala in Databricks Runtime 9.1 LTS and above.
`REPLACE USING`	Dynamic data overwrite. Replaces all rows that match the specified columns, based on equality comparison of the column values, in the provided data set.	All compute types.	SQL in Databricks Runtime 16.3 and above. Python and Scala in Databricks Runtime 18.2 and above.
`REPLACE ON`	Dynamic data overwrite by boolean expression. Use for replacements with a complex or NULL-safe matching condition, such as `s.colA <=> t.colA AND s.colB <=> t.colB`.	All compute types.	SQL in Databricks Runtime 17.1 and above. Python and Scala in Databricks Runtime 18.2 and above.
`partitionOverwriteMode`	Legacy dynamic partition overwrite, which overwrites all the existing data in each partition for which the write will commit new data. Not recommended for new workloads.	SQL supports classic compute only. Python and Scala supports all compute types.	SQL, Python, and Scala in Databricks Runtime 11.3 LTS and above.

For most use cases, Databricks recommends using REPLACE USING or REPLACE WHERE. Use REPLACE ON only if your use case requires complex or NULL-safe matching conditions.

For detail on each option's replacement behavior, see INSERT. For a complete list of DataFrameWriter Delta Lake options, see Delta Lake and Apache Iceberg.

In Scala and Python, replaceOn and replaceUsing can't be used in combination with replaceWhere, partitionOverwriteMode, or overwriteSchema.

For empty source queries, both REPLACE USING and REPLACE ON do not delete data, however, REPLACE WHERE might delete data.

Important

If data has been accidentally overwritten, you can use restore to undo the change.

`REPLACE WHERE`

You can selectively overwrite only the data that matches an arbitrary expression with REPLACE WHERE.

Important

To benefit from incremental refresh when running REPLACE WHERE, use REPLACE WHERE flows in Spark Declarative Pipelines (SDP). See Batch processing with REPLACE WHERE flows.

To atomically replace events in January in the target table, which is partitioned by start_date, with the data in replace_data:

Python

(replace_data.write
  .mode("overwrite")
  .option("replaceWhere", "start_date >= '2017-01-01' AND end_date <= '2017-01-31'")
  .saveAsTable("events")
)

Scala

replace_data.write
  .mode("overwrite")
  .option("replaceWhere", "start_date >= '2017-01-01' AND end_date <= '2017-01-31'")
  .saveAsTable("events")

SQL

INSERT INTO TABLE events REPLACE WHERE start_date >= '2017-01-01' AND end_date <= '2017-01-31' SELECT * FROM replace_data

This sample code writes out the data in replace_data, validates that all rows match the predicate, and performs an atomic replacement using overwrite semantics. If any values in the operation fall outside the predicate, this operation fails with an error by default.

On classic compute, to change this behavior to overwrite values within the predicate range and insert records outside the specified range, remove the constraint check by setting spark.databricks.delta.replaceWhere.constraintCheck.enabled to false:

Python

spark.conf.set("spark.databricks.delta.replaceWhere.constraintCheck.enabled", False)

Scala

spark.conf.set("spark.databricks.delta.replaceWhere.constraintCheck.enabled", false)

SQL

SET spark.databricks.delta.replaceWhere.constraintCheck.enabled=false

Note

REPLACE WHERE accepts a boolean_expression with some restrictions. See INSERT in the SQL language reference.

For empty source queries, REPLACE WHERE might delete table rows.

Legacy behavior

Legacy replaceWhere is only available on classic compute. See Classic compute overview.

If you use the legacy behavior of replaceWhere, queries overwrite data that matches a predicate only over partition columns. The following command would atomically replace the month January in the target table, which is partitioned by date, with the data in df:

Python

(df.write
  .mode("overwrite")
  .option("replaceWhere", "birthDate >= '2017-01-01' AND birthDate <= '2017-01-31'")
  .saveAsTable("people10m")
)

Scala

df.write
  .mode("overwrite")
  .option("replaceWhere", "birthDate >= '2017-01-01' AND birthDate <= '2017-01-31'")
  .saveAsTable("people10m")

To use legacy behavior, set spark.databricks.delta.replaceWhere.dataColumns.enabled to false:

Python

spark.conf.set("spark.databricks.delta.replaceWhere.dataColumns.enabled", False)

Scala

spark.conf.set("spark.databricks.delta.replaceWhere.dataColumns.enabled", false)

SQL

SET spark.databricks.delta.replaceWhere.dataColumns.enabled=false

Dynamic data overwrites

Dynamic data overwrites selectively replace data that match the specified key columns or boolean expression, leaving all other data unchanged. Partitioned tables, unpartitioned tables, and tables with liquid clustering are all supported.

Dynamic partition overwrites are a subset of dynamic data overwrite behavior. Dynamic partition overwrites replace all the existing data in each partition for which the write will commit new data and leaves all other partitions unchanged. Only partitioned tables are supported.

`REPLACE USING`

SQL supported in Databricks Runtime 16.3 and above. Python and Scala supported in Databricks Runtime 18.2 and above. For behavior differences in Databricks Runtime 16.3 to 17.1 see Legacy behavior.

REPLACE USING enables compute-independent, atomic overwrite behavior that works on Databricks SQL warehouses, serverless compute, and classic compute. REPLACE USING doesn't require that you set a Spark session configuration.

REPLACE USING replaces rows when the specified columns compare equal under equality. All other data remains unchanged.

Use dynamic data overwrite with REPLACE USING:

Python

(sourceDataDF.write
  .mode("overwrite")
  .option("replaceUsing", "event_id, start_date")
  .saveAsTable("events")
)

Scala

sourceDataDF.write
  .mode("overwrite")
  .option("replaceUsing", "event_id, start_date")
  .saveAsTable("events")

SQL

INSERT INTO TABLE events
  REPLACE USING (event_id, start_date)
  SELECT * FROM source_data

For empty source queries, REPLACE USING does not delete any table rows.

For complex or NULL-safe matching conditions, use REPLACE ON instead. See REPLACE ON.

See INSERT in the SQL language reference.

Legacy behavior

In Databricks Runtime 16.3 to 17.1, REPLACE USING uses legacy behavior and only allows dynamic partition overwrites, whereas Databricks Runtime 17.2 and above allows for dynamic data overwrites.

Keep the following constraints and behaviors in mind for REPLACE USING legacy behavior:

You must specify the full set of the table's partition columns in the USING clause.
Always validate that the data written touches only the expected partitions. A single row in the wrong partition can unintentionally overwrite the entire partition.

`REPLACE ON`

SQL supported in Databricks Runtime 17.1 and above. Python and Scala supported in Databricks Runtime 18.2 and above.

REPLACE ON replaces rows when they match a user-defined condition, unlike REPLACE USING, which replaces rows when the specified columns compare equal under equality. Use REPLACE ON when you need matching logic that REPLACE USING does not support, such as treating NULL values as equal.

Optionally, use the targetAlias option to specify an alias for the target table and the .as() or .alias() APIs to specify an alias for the source data.

For SQL syntax, see INSERT.

Python

(sourceDataDF.alias("s")
  .write
  .mode("overwrite")
  .option("targetAlias", "t")
  .option("replaceOn", "s.event_id <=> t.event_id AND s.start_date <=> t.start_date")
  .saveAsTable("events")
)

Scala

sourceDataDF.as("s")
  .write
  .mode("overwrite")
  .option("targetAlias", "t")
  .option("replaceOn", "s.event_id <=> t.event_id AND s.start_date <=> t.start_date")
  .saveAsTable("events")

SQL

INSERT INTO TABLE events AS t
  REPLACE ON (s.event_id <=> t.event_id AND s.start_date <=> t.start_date)
  (SELECT * FROM source_data) AS s

For empty source queries, REPLACE ON does not delete any table rows.

Dynamic partition overwrites with `partitionOverwriteMode` (legacy)

Important

This feature is in Public Preview.

Databricks Runtime 11.3 LTS and above supports dynamic partition overwrites for partitioned tables using overwrite mode: either INSERT OVERWRITE in SQL, or a DataFrame write with df.write.mode("overwrite"). This type of overwrite is only available for classic compute, not Databricks SQL warehouses or serverless compute.

Warning

When possible, use INSERT REPLACE USING instead of partition overwrite INSERT OVERWRITE PARTITION and spark.sql.sources.partitionOverwriteMode=dynamic. Partition overwrite may use stale data when partitioning changes.

To use dynamic partition overwrite mode, set the Spark session configuration spark.sql.sources.partitionOverwriteMode to dynamic. Alternatively, you can set the DataFrameWriter option partitionOverwriteMode to dynamic. If present, the query-specific option overrides the mode defined in the session configuration. The default value for spark.sql.sources.partitionOverwriteMode is static.

The following example uses partitionOverwriteMode:

SQL

SET spark.sql.sources.partitionOverwriteMode=dynamic;
INSERT OVERWRITE TABLE default.people10m SELECT * FROM morePeople;

Python

(df.write
  .mode("overwrite")
  .option("partitionOverwriteMode", "dynamic")
  .saveAsTable("default.people10m")
)

Scala

df.write
  .mode("overwrite")
  .option("partitionOverwriteMode", "dynamic")
  .saveAsTable("default.people10m")

Keep the following constraints and behaviors in mind for partitionOverwriteMode:

You can't set overwriteSchema to true.
You can't specify both partitionOverwriteMode and replaceWhere in the same DataFrameWriter operation.
If you specify a replaceWhere condition using a DataFrameWriter option, Delta Lake applies that condition to control which data is overwritten. This option takes precedence over the partitionOverwriteMode session-level configuration.
Always validate that the data written touches only the expected partitions. A single row in the wrong partition can unintentionally overwrite the entire partition.

Feedback

Var denne side nyttig?

Last updated on 2026-05-11

Selectively overwrite data with Delta Lake

REPLACE WHERE

Python

Scala

SQL

Python

Scala

SQL

Legacy behavior

Python

Scala

Python

Scala

SQL

Dynamic data overwrites

REPLACE USING

Python

Scala

SQL

Legacy behavior

REPLACE ON

Python

Scala

SQL

Dynamic partition overwrites with partitionOverwriteMode (legacy)

SQL

Python

Scala

Feedback

Yderligere ressourcer

`REPLACE WHERE`

`REPLACE USING`

`REPLACE ON`

Dynamic partition overwrites with `partitionOverwriteMode` (legacy)