แชร์ผ่าน


Delta table properties reference

Delta Lake reserves Delta table properties starting with delta.. These properties may have specific meanings, and affect behaviors when these properties are set.

Note

All operations that set or update table properties conflict with other concurrent write operations, causing them to fail. Databricks recommends you modify a table property only when there are no concurrent write operations on the table.

How do table properties and SparkSession properties interact?

Delta table properties are set per table. If a property is set on a table, then this is the setting that is followed by default.

Some table properties have associated SparkSession configurations which always take precedence over table properties. Some examples include the spark.databricks.delta.autoCompact.enabled and spark.databricks.delta.optimizeWrite.enabled configurations, which turn on auto compaction and optimized writes at the SparkSession level rather than the table level. Databricks recommends using table-scoped configurations for most workloads.

For every Delta table property you can set a default value for new tables using a SparkSession configuration, overriding the built-in default. This setting only affects new tables and does not override or replace properties set on existing tables. The prefix used in the SparkSession is different from the configurations used in the table properties, as shown in the following table:

Delta Lake conf SparkSession conf
delta.<conf> spark.databricks.delta.properties.defaults.<conf>

For example, to set the delta.appendOnly = true property for all new Delta Lake tables created in a session, set the following:

SET spark.databricks.delta.properties.defaults.appendOnly = true

To modify table properties of existing tables, use SET TBLPROPERTIES.

Delta table properties

Available Delta table properties include the following:

Property
delta.appendOnly

true for this Delta table to be append-only. If append-only, existing records cannot be deleted, and existing values cannot be updated.

See Delta table properties reference.

Data type: Boolean

Default: false
delta.autoOptimize.autoCompact

auto for Delta Lake to automatically optimize the layout of the files for this Delta table.

See Auto compaction for Delta Lake on Azure Databricks.

Data type: Boolean

Default: (none)
delta.autoOptimize.optimizeWrite

true for Delta Lake to automatically optimize the layout of the files for this Delta table during writes.

See Optimized writes for Delta Lake on Azure Databricks.

Data type: Boolean

Default: (none)
delta.checkpoint.writeStatsAsJson

true for Delta Lake to write file statistics in checkpoints in JSON format for the stats column.

See Manage column-level statistics in checkpoints.

Data type: Boolean

Default: true
delta.checkpoint.writeStatsAsStruct

true for Delta Lake to write file statistics to checkpoints in struct format for the stats_parsed column and to write partition values as a struct for partitionValues_parsed.

See Manage column-level statistics in checkpoints.

Data type: Boolean

Default: (none)
delta.checkpointPolicy

classic for classic Delta Lake checkpoints. v2 for v2 checkpoints.

See Compatibility for tables with liquid clustering.

Data type: String

Default: classic
delta.columnMapping.mode

Whether column mapping is enabled for Delta table columns and the corresponding Parquet columns that use different names.

See Rename and drop columns with Delta Lake column mapping.

Note: Enabling delta.columnMapping.mode automatically enables
delta.randomizeFilePrefixes.

Data type: DeltaColumnMappingMode

Default: none
delta.dataSkippingNumIndexedCols

The number of columns for Delta Lake to collect statistics about for data skipping. A value of -1 means to collect statistics for all columns.

See Data skipping for Delta Lake.

Data type: Int

Default: 32
delta.dataSkippingStatsColumns

A comma-separated list of column names on which Delta Lake collects statistics to enhance data skipping functionality. This property takes precedence over delta.dataSkippingNumIndexedCols.

See Data skipping for Delta Lake.

Data type: String

Default: (none)
delta.deletedFileRetentionDuration

The shortest duration for Delta Lake to keep logically deleted data files before deleting them physically. This is to prevent failures in stale readers after compactions or partition overwrites.

This value should be large enough to ensure that:

- It is larger than the longest possible duration of a job if you run VACUUM when there are concurrent readers or writers accessing the Delta table.
- If you run a streaming query that reads from the table, that query does not stop for longer than this value. Otherwise, the query may not be able to restart, as it must still read old files.

See Configure data retention for time travel queries.

Data type: CalendarInterval

Default: interval 1 week
delta.enableChangeDataFeed

true to enable change data feed.

See Enable change data feed.

Data type: Boolean

Default: false
delta.enableDeletionVectors

true to enable deletion vectors and predictive I/O for updates.

See What are deletion vectors?.

Data type: Boolean

Default: Depends on workspace admin settings and Databricks Runtime version. See Auto-enable deletion vectors
delta.isolationLevel

The degree to which a transaction must be isolated from modifications made by concurrent transactions.

Valid values are Serializable and WriteSerializable.

See Isolation levels and write conflicts on Azure Databricks.

Data type: String

Default: WriteSerializable
delta.logRetentionDuration

How long the history for a Delta table is kept. VACUUM operations override this retention threshold.

Each time a checkpoint is written, Delta Lake automatically cleans up log entries older than the retention interval. If you set this property to a large enough value, many log entries are retained. This should not impact performance as operations against the log are constant time. Operations on history are parallel but will become more expensive as the log size increases.

See Configure data retention for time travel queries.

Data type: CalendarInterval

Default: interval 30 days
delta.minReaderVersion

The minimum required protocol reader version for a reader that allows to read from this Delta table.

Databricks recommends against manually configuring this property.

See How does Azure Databricks manage Delta Lake feature compatibility?.

Data type: Int

Default: 1
delta.minWriterVersion

The minimum required protocol writer version for a writer that allows to write to this Delta table.

Databricks recommends against manually configuring this property.

See How does Azure Databricks manage Delta Lake feature compatibility?.

Data type: Int

Default: 2
delta.randomizeFilePrefixes

true for Delta Lake to generate a random prefix for a file path instead of partition information.

Data type: Boolean

Default: false
delta.randomPrefixLength

When delta.randomizeFilePrefixes is set to true, the number of characters that Delta Lake generates for random prefixes.

Data type: Int

Default: 2
delta.setTransactionRetentionDuration

The shortest duration within which new snapshots will retain transaction identifiers (for example, SetTransactions). When a new snapshot sees a transaction identifier older than or equal to the duration specified by this property, the snapshot considers it expired and ignores it. The SetTransaction identifier is used when making the writes idempotent. See Idempotent table writes in foreachBatch for details.

Data type: CalendarInterval

Default: (none)
delta.targetFileSize

The target file size in bytes or higher units for file tuning. For example,
104857600 (bytes) or 100mb.

See Configure Delta Lake to control data file size.

Data type: String

Default: (none)
delta.tuneFileSizesForRewrites

true to always use lower file sizes for all data layout optimization operations on the Delta table.

false to never tune to lower file sizes, that is, prevent auto-detection from being activated.

See Configure Delta Lake to control data file size.

Data type: Boolean

Default: (none)