แชร์ผ่าน


Table properties reference

Delta Lake and Apache Iceberg use table properties to control table behavior and features. These properties might have specific meanings and affect behaviors when set.

Note

All operations that set or update table properties conflict with other concurrent write operations, causing them to fail. Databricks recommends you modify a table property only when there are no concurrent write operations on the table.

Modify table properties

To modify table properties of existing tables, use SET TBLPROPERTIES.

Delta and Iceberg formats

Delta Lake and Apache Iceberg tables share the same table property names, but require different prefixes:

  • Delta tables: Use the delta. prefix
  • Iceberg tables: Use the iceberg. prefix

For example:

  • To enable deletion vectors on a Delta table: delta.enableDeletionVectors
  • To enable deletion vectors on an Iceberg table: iceberg.enableDeletionVectors

Table properties and SparkSession properties

Each table has its own table properties that control its behavior. Some SparkSession configurations always override table properties. For example, autoCompact.enabled and optimizeWrite.enabled enable auto compaction and optimized writes at the SparkSession level. Databricks recommends using table-scoped configurations for most workloads.

You can set default values for new tables using SparkSession configurations. These defaults only apply to new tables and don't affect existing table properties. SparkSession configurations use a different prefix than table properties, as shown in the following table:

Table property SparkSession configuration
delta.<conf>
iceberg.<conf>
spark.databricks.delta.properties.defaults.<conf>
spark.databricks.iceberg.properties.defaults.<conf>

For example, to set the appendOnly = true property for all new tables created in a session, set the following:

-- For Delta tables
SET spark.databricks.delta.properties.defaults.appendOnly = true

-- For Iceberg tables
SET spark.databricks.iceberg.properties.defaults.appendOnly = true

Table properties

The following table properties are available for both Delta Lake and Apache Iceberg tables. Use the delta. prefix for Delta tables and iceberg. prefix for Iceberg tables.

Property Description
autoOptimize.optimizeWrite true to automatically optimize the layout of the files for this table during writes.
See Optimized writes.
Data type: Boolean
Default: (none)
dataSkippingNumIndexedCols The number of columns to collect statistics about for data skipping. A value of -1 means to collect statistics for all columns.
See Data skipping.
Data type: Int
Default: 32
dataSkippingStatsColumns A comma-separated list of column names on which to collect statistics to enhance data skipping functionality. This property takes precedence over dataSkippingNumIndexedCols.
See Data skipping.
Data type: String
Default: (none)
deletedFileRetentionDuration The shortest duration to keep logically deleted data files before deleting them physically. This prevents failures in stale readers after compactions or partition overwrites.
Set this value large enough to ensure that:
  • The value exceeds the longest possible duration of a job when you run VACUUM with concurrent readers or writers accessing the table.
  • Streaming queries reading from the table don't stop for longer than this value. Otherwise, the query can't restart because it must read old files.

See Configure data retention for time travel queries.
Data type: CalendarInterval
Default: interval 1 week
enableDeletionVectors true to enable deletion vectors and predictive I/O for updates.
See Deletion vectors in Databricks and Enable deletion vectors.
Data type: Boolean
Default: Depends on workspace admin settings and Databricks Runtime version. See Auto-enable deletion vectors.
logRetentionDuration How long to keep the history for a table. VACUUM operations override this retention threshold.
Databricks automatically cleans up log entries older than the retention interval each time a checkpoint is written. Setting this property to a large value retains many log entries. This doesn't impact performance because operations against the log are constant time. Operations on history are parallel but become more expensive as the log size increases.
See Configure data retention for time travel queries.
Data type: CalendarInterval
Default: interval 30 days
minReaderVersion (Delta Lake only) The minimum required protocol reader version to read from this table.
Databricks recommends against manually configuring this property.
See Delta Lake feature compatibility and protocols.
Data type: Int
Default: 1
minWriterVersion (Delta Lake only) The minimum required protocol writer version to write to this table.
Databricks recommends against manually configuring this property.
See Delta Lake feature compatibility and protocols.
Data type: Int
Default: 2
format-version (Apache Iceberg managed tables only) The Iceberg table format version.
Databricks recommends against manually configuring this property.
See Use Apache Iceberg v3 features.
Data type: Int
Default: 2
randomizeFilePrefixes true to generate a random prefix for a file path instead of partition information.
Data type: Boolean
Default: false
targetFileSize The target file size in bytes or higher units for file tuning. For example, 104857600 (bytes) or 100mb.
See Control data file size.
Data type: String
Default: (none)
parquet.compression.codec The compression codec for a table.
Valid values: ZSTD, SNAPPY, GZIP, LZ4, BROTLI (support varies by format)
This property ensures that all future writes to the table use the chosen codec, overriding the cluster or session default (spark.sql.parquet.compression.codec). However, one-off DataFrame .write.option("compression", "...") settings still take precedence. Available in Databricks Runtime 16.0 and later. Note that existing files aren't rewritten automatically. To recompress existing data with your chosen format, use OPTIMIZE table_name FULL.
Data type: String
Default: ZSTD
appendOnly true to make the table append-only. Append-only tables don't allow deleting existing records or updating existing values.
Data type: Boolean
Default: false
autoOptimize.autoCompact Automatically combines small files within table partitions to reduce small file problems. Accepts auto (recommended), true, legacy, or false.
See Auto compaction.
Data type: String
Default: (none)
checkpoint.writeStatsAsJson true to write file statistics in checkpoints in JSON format for the stats column.
Data type: Boolean
Default: false
checkpoint.writeStatsAsStruct true to write file statistics to checkpoints in struct format for the stats_parsed column and to write partition values as a struct for partitionValues_parsed.
Data type: Boolean
Default: true
checkpointPolicy classic for classic checkpoints. v2 for v2 checkpoints.
See Compatibility for tables with liquid clustering.
Data type: String
Default: classic
columnMapping.mode Enables column mapping for table columns and the corresponding Parquet columns that use different names.
See Rename and drop columns with Delta Lake column mapping.
Note: Enabling columnMapping.mode automatically enables randomizeFilePrefixes.
Data type: DeltaColumnMappingMode
Default: none
compatibility.symlinkFormatManifest.enabled (Delta Lake only) true to configure the Delta table so that all write operations on the table automatically update the manifests.
Data type: Boolean
Default: false
enableChangeDataFeed true to enable change data feed.
See Enable change data feed.
Data type: Boolean
Default: false
enableTypeWidening true to enable type widening.
See Type widening.
Data type: Boolean
Default: false
isolationLevel The degree to which a transaction must be isolated from modifications made by concurrent transactions.
Valid values are Serializable and WriteSerializable.
See Isolation levels and write conflicts on Azure Databricks.
Data type: String
Default: WriteSerializable
randomPrefixLength The number of characters to generate for random prefixes when randomizeFilePrefixes is true.
Data type: Int
Default: 2
setTransactionRetentionDuration The shortest duration within which new snapshots retain transaction identifiers (for example, SetTransactions). New snapshots expire and ignore transaction identifiers older than or equal to the duration specified by this property. The SetTransaction identifier is used when making writes idempotent. See Idempotent table writes in foreachBatch for details.
Data type: CalendarInterval
Default: (none)
tuneFileSizesForRewrites true to always use lower file sizes for all data layout optimization operations on the table.
false prevents tuning to lower file sizes and disables auto-detection.
See Control data file size.
Data type: Boolean
Default: (none)