หมายเหตุ
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลอง ลงชื่อเข้าใช้หรือเปลี่ยนไดเรกทอรีได้
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลองเปลี่ยนไดเรกทอรีได้
Delta Lake and Apache Iceberg use table properties to control table behavior and features. These properties might have specific meanings and affect behaviors when set.
Note
All operations that set or update table properties conflict with other concurrent write operations, causing them to fail. Databricks recommends you modify a table property only when there are no concurrent write operations on the table.
Modify table properties
To modify table properties of existing tables, use SET TBLPROPERTIES.
Delta and Iceberg formats
Delta Lake and Apache Iceberg tables share the same table property names, but require different prefixes:
- Delta tables: Use the
delta.prefix - Iceberg tables: Use the
iceberg.prefix
For example:
- To enable deletion vectors on a Delta table:
delta.enableDeletionVectors - To enable deletion vectors on an Iceberg table:
iceberg.enableDeletionVectors
Table properties and SparkSession properties
Each table has its own table properties that control its behavior. Some SparkSession configurations always override table properties. For example, autoCompact.enabled and optimizeWrite.enabled enable auto compaction and optimized writes at the SparkSession level. Databricks recommends using table-scoped configurations for most workloads.
You can set default values for new tables using SparkSession configurations. These defaults only apply to new tables and don't affect existing table properties. SparkSession configurations use a different prefix than table properties, as shown in the following table:
| Table property | SparkSession configuration |
|---|---|
delta.<conf>iceberg.<conf> |
spark.databricks.delta.properties.defaults.<conf>spark.databricks.iceberg.properties.defaults.<conf> |
For example, to set the appendOnly = true property for all new tables created in a session, set the following:
-- For Delta tables
SET spark.databricks.delta.properties.defaults.appendOnly = true
-- For Iceberg tables
SET spark.databricks.iceberg.properties.defaults.appendOnly = true
Table properties
The following table properties are available for both Delta Lake and Apache Iceberg tables. Use the delta. prefix for Delta tables and iceberg. prefix for Iceberg tables.
| Property | Description |
|---|---|
autoOptimize.optimizeWrite |
true to automatically optimize the layout of the files for this table during writes.See Optimized writes. Data type: BooleanDefault: (none) |
dataSkippingNumIndexedCols |
The number of columns to collect statistics about for data skipping. A value of -1 means to collect statistics for all columns.See Data skipping. Data type: IntDefault: 32 |
dataSkippingStatsColumns |
A comma-separated list of column names on which to collect statistics to enhance data skipping functionality. This property takes precedence over dataSkippingNumIndexedCols.See Data skipping. Data type: StringDefault: (none) |
deletedFileRetentionDuration |
The shortest duration to keep logically deleted data files before deleting them physically. This prevents failures in stale readers after compactions or partition overwrites. Set this value large enough to ensure that:
See Configure data retention for time travel queries. Data type: CalendarIntervalDefault: interval 1 week |
enableDeletionVectors |
true to enable deletion vectors and predictive I/O for updates.See Deletion vectors in Databricks and Enable deletion vectors. Data type: BooleanDefault: Depends on workspace admin settings and Databricks Runtime version. See Auto-enable deletion vectors. |
logRetentionDuration |
How long to keep the history for a table. VACUUM operations override this retention threshold.Databricks automatically cleans up log entries older than the retention interval each time a checkpoint is written. Setting this property to a large value retains many log entries. This doesn't impact performance because operations against the log are constant time. Operations on history are parallel but become more expensive as the log size increases. See Configure data retention for time travel queries. Data type: CalendarIntervalDefault: interval 30 days |
minReaderVersion (Delta Lake only) |
The minimum required protocol reader version to read from this table. Databricks recommends against manually configuring this property. See Delta Lake feature compatibility and protocols. Data type: IntDefault: 1 |
minWriterVersion (Delta Lake only) |
The minimum required protocol writer version to write to this table. Databricks recommends against manually configuring this property. See Delta Lake feature compatibility and protocols. Data type: IntDefault: 2 |
format-version (Apache Iceberg managed tables only) |
The Iceberg table format version. Databricks recommends against manually configuring this property. See Use Apache Iceberg v3 features. Data type: IntDefault: 2 |
randomizeFilePrefixes |
true to generate a random prefix for a file path instead of partition information.Data type: BooleanDefault: false |
targetFileSize |
The target file size in bytes or higher units for file tuning. For example, 104857600 (bytes) or 100mb.See Control data file size. Data type: StringDefault: (none) |
parquet.compression.codec |
The compression codec for a table. Valid values: ZSTD, SNAPPY, GZIP, LZ4, BROTLI (support varies by format)This property ensures that all future writes to the table use the chosen codec, overriding the cluster or session default ( spark.sql.parquet.compression.codec). However, one-off DataFrame .write.option("compression", "...") settings still take precedence. Available in Databricks Runtime 16.0 and later. Note that existing files aren't rewritten automatically. To recompress existing data with your chosen format, use OPTIMIZE table_name FULL.Data type: StringDefault: ZSTD |
appendOnly |
true to make the table append-only. Append-only tables don't allow deleting existing records or updating existing values.Data type: BooleanDefault: false |
autoOptimize.autoCompact |
Automatically combines small files within table partitions to reduce small file problems. Accepts auto (recommended), true, legacy, or false.See Auto compaction. Data type: StringDefault: (none) |
checkpoint.writeStatsAsJson |
true to write file statistics in checkpoints in JSON format for the stats column.Data type: BooleanDefault: false |
checkpoint.writeStatsAsStruct |
true to write file statistics to checkpoints in struct format for the stats_parsed column and to write partition values as a struct for partitionValues_parsed.Data type: BooleanDefault: true |
checkpointPolicy |
classic for classic checkpoints. v2 for v2 checkpoints.See Compatibility for tables with liquid clustering. Data type: StringDefault: classic |
columnMapping.mode |
Enables column mapping for table columns and the corresponding Parquet columns that use different names. See Rename and drop columns with Delta Lake column mapping. Note: Enabling columnMapping.mode automatically enables randomizeFilePrefixes.Data type: DeltaColumnMappingModeDefault: none |
compatibility.symlinkFormatManifest.enabled (Delta Lake only) |
true to configure the Delta table so that all write operations on the table automatically update the manifests.Data type: BooleanDefault: false |
enableChangeDataFeed |
true to enable change data feed.See Enable change data feed. Data type: BooleanDefault: false |
enableTypeWidening |
true to enable type widening.See Type widening. Data type: BooleanDefault: false |
isolationLevel |
The degree to which a transaction must be isolated from modifications made by concurrent transactions. Valid values are Serializable and WriteSerializable.See Isolation levels and write conflicts on Azure Databricks. Data type: StringDefault: WriteSerializable |
randomPrefixLength |
The number of characters to generate for random prefixes when randomizeFilePrefixes is true.Data type: IntDefault: 2 |
setTransactionRetentionDuration |
The shortest duration within which new snapshots retain transaction identifiers (for example, SetTransactions). New snapshots expire and ignore transaction identifiers older than or equal to the duration specified by this property. The SetTransaction identifier is used when making writes idempotent. See Idempotent table writes in foreachBatch for details.Data type: CalendarIntervalDefault: (none) |
tuneFileSizesForRewrites |
true to always use lower file sizes for all data layout optimization operations on the table.false prevents tuning to lower file sizes and disables auto-detection.See Control data file size. Data type: BooleanDefault: (none) |