Delta table properties reference
Delta Lake reserves Delta table properties starting with delta.
. These properties may have specific meanings, and affect behaviors when these properties are set.
Note
All operations that set or update table properties conflict with other concurrent write operations, causing them to fail. Databricks recommends you modify a table property only when there are no concurrent write operations on the table.
How do table properties and SparkSession properties interact?
Delta table properties are set per table. If a property is set on a table, then this is the setting that is followed by default.
Some table properties have associated SparkSession configurations which always take precedence over table properties. Some examples include the spark.databricks.delta.autoCompact.enabled
and spark.databricks.delta.optimizeWrite.enabled
configurations, which turn on auto compaction and optimized writes at the SparkSession level rather than the table level. Databricks recommends using table-scoped configurations for most workloads.
For every Delta table property you can set a default value for new tables using a SparkSession configuration, overriding the built-in default. This setting only affects new tables and does not override or replace properties set on existing tables. The prefix used in the SparkSession is different from the configurations used in the table properties, as shown in the following table:
Delta Lake conf | SparkSession conf |
---|---|
delta.<conf> |
spark.databricks.delta.properties.defaults.<conf> |
For example, to set the delta.appendOnly = true
property for all new Delta Lake tables created in a session, set the following:
SET spark.databricks.delta.properties.defaults.appendOnly = true
To modify table properties of existing tables, use SET TBLPROPERTIES.
Delta table properties
Available Delta table properties include the following:
Property |
---|
delta.appendOnly true for this Delta table to be append-only. If append-only, existing records cannot be deleted, and existing values cannot be updated.See Delta table properties reference. Data type: Boolean Default: false |
delta.autoOptimize.autoCompact auto for Delta Lake to automatically optimize the layout of the files for this Delta table.See Auto compaction for Delta Lake on Azure Databricks. Data type: Boolean Default: (none) |
delta.autoOptimize.optimizeWrite true for Delta Lake to automatically optimize the layout of the files for this Delta table during writes.See Optimized writes for Delta Lake on Azure Databricks. Data type: Boolean Default: (none) |
delta.checkpoint.writeStatsAsJson true for Delta Lake to write file statistics in checkpoints in JSON format for the stats column.See Manage column-level statistics in checkpoints. Data type: Boolean Default: true |
delta.checkpoint.writeStatsAsStruct true for Delta Lake to write file statistics to checkpoints in struct format for the stats_parsed column and to write partition values as a struct for partitionValues_parsed .See Manage column-level statistics in checkpoints. Data type: Boolean Default: (none) |
delta.checkpointPolicy classic for classic Delta Lake checkpoints. v2 for v2 checkpoints.See Compatibility for tables with liquid clustering. Data type: String Default: classic |
delta.columnMapping.mode Whether column mapping is enabled for Delta table columns and the corresponding Parquet columns that use different names. See Rename and drop columns with Delta Lake column mapping. Note: Enabling delta.columnMapping.mode automatically enablesdelta.randomizeFilePrefixes .Data type: DeltaColumnMappingMode Default: none |
delta.dataSkippingNumIndexedCols The number of columns for Delta Lake to collect statistics about for data skipping. A value of -1 means to collect statistics for all columns.See Data skipping for Delta Lake. Data type: Int Default: 32 |
delta.dataSkippingStatsColumns A comma-separated list of column names on which Delta Lake collects statistics to enhance data skipping functionality. This property takes precedence over delta.dataSkippingNumIndexedCols .See Data skipping for Delta Lake. Data type: String Default: (none) |
delta.deletedFileRetentionDuration The shortest duration for Delta Lake to keep logically deleted data files before deleting them physically. This is to prevent failures in stale readers after compactions or partition overwrites. This value should be large enough to ensure that: - It is larger than the longest possible duration of a job if you run VACUUM when there are concurrent readers or writers accessing the Delta table.- If you run a streaming query that reads from the table, that query does not stop for longer than this value. Otherwise, the query may not be able to restart, as it must still read old files. See Configure data retention for time travel queries. Data type: CalendarInterval Default: interval 1 week |
delta.enableChangeDataFeed true to enable change data feed.See Enable change data feed. Data type: Boolean Default: false |
delta.enableDeletionVectors true to enable deletion vectors and predictive I/O for updates.See What are deletion vectors?. Data type: Boolean Default: Depends on workspace admin settings and Databricks Runtime version. See Auto-enable deletion vectors |
delta.isolationLevel The degree to which a transaction must be isolated from modifications made by concurrent transactions. Valid values are Serializable and WriteSerializable .See Isolation levels and write conflicts on Azure Databricks. Data type: String Default: WriteSerializable |
delta.logRetentionDuration How long the history for a Delta table is kept. VACUUM operations override this retention threshold.Each time a checkpoint is written, Delta Lake automatically cleans up log entries older than the retention interval. If you set this property to a large enough value, many log entries are retained. This should not impact performance as operations against the log are constant time. Operations on history are parallel but will become more expensive as the log size increases. See Configure data retention for time travel queries. Data type: CalendarInterval Default: interval 30 days |
delta.minReaderVersion The minimum required protocol reader version for a reader that allows to read from this Delta table. Databricks recommends against manually configuring this property. See How does Azure Databricks manage Delta Lake feature compatibility?. Data type: Int Default: 1 |
delta.minWriterVersion The minimum required protocol writer version for a writer that allows to write to this Delta table. Databricks recommends against manually configuring this property. See How does Azure Databricks manage Delta Lake feature compatibility?. Data type: Int Default: 2 |
delta.randomizeFilePrefixes true for Delta Lake to generate a random prefix for a file path instead of partition information.Data type: Boolean Default: false |
delta.randomPrefixLength When delta.randomizeFilePrefixes is set to true , the number of characters that Delta Lake generates for random prefixes.Data type: Int Default: 2 |
delta.setTransactionRetentionDuration The shortest duration within which new snapshots will retain transaction identifiers (for example, SetTransaction s). When a new snapshot sees a transaction identifier older than or equal to the duration specified by this property, the snapshot considers it expired and ignores it. The SetTransaction identifier is used when making the writes idempotent. See Idempotent table writes in foreachBatch for details.Data type: CalendarInterval Default: (none) |
delta.targetFileSize The target file size in bytes or higher units for file tuning. For example, 104857600 (bytes) or 100mb .See Configure Delta Lake to control data file size. Data type: String Default: (none) |
delta.tuneFileSizesForRewrites true to always use lower file sizes for all data layout optimization operations on the Delta table.false to never tune to lower file sizes, that is, prevent auto-detection from being activated.See Configure Delta Lake to control data file size. Data type: Boolean Default: (none) |