Use variant shredding to optimize performance

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

Variant shredding improves query performance on VARIANT columns by storing commonly occurring fields as separate columns in the underlying Parquet files. Shredding reduces the I/O required to read fields and improves compression by using a columnar format instead of a binary blob.

See VARIANT type, Variant type support for Apache Iceberg and Delta Lake, and Query variant data.

Requirements

Databricks Runtime 17.2 or above is required to read and write shredded VARIANT tables.

Enable shredding

Workspace admins can enable shredding from the workspace Previews page. See Manage Azure Databricks previews.

No code changes are required to read or write VARIANT data with shredding.

After you enable the feature for your workspace, shredding is automatically enabled on tables for the following scenarios:

  • CREATE TABLE with one or more VARIANT columns.
  • CREATE AND REPLACE TABLE with one or more VARIANT columns.
  • ALTER TABLE when adding one or more VARIANT columns.

For existing tables, you can manually opt in to shredding by setting the enableVariantShredding table property to true and opt out by setting the property to false, provided that shredding is enabled at the workspace level:

Delta Lake

ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'true');

Iceberg table

ALTER TABLE my_table SET TBLPROPERTIES ('iceberg.enableVariantShredding' = 'true');

Verify that shredding is enabled by checking that the table property enableVariantShredding is set to true.

Opt out of shredding for a specific table

If you enable the shredding Beta for your workspace but want to exclude a specific table, set the enableVariantShredding table property to false:

Delta Lake

ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'false');

Iceberg table

ALTER TABLE my_table SET TBLPROPERTIES ('iceberg.enableVariantShredding' = 'false');

Remove shredding from an existing table

To remove shredding on an existing table, drop the feature with the ALTER TABLE command. This operation also rewrites shredded VARIANT data in place to the unshredded VARIANT format and sets the enableVariantShredding table property to false.

ALTER TABLE my_table DROP FEATURE "variantShredding-preview";

Limitations

  • Shredding data introduces some overhead on writes.
  • Enabling shredding doesn't automatically convert existing VARIANT data in a table. It only applies to data written after the feature is enabled. To rewrite existing VARIANT data, use REORG TABLE my_table APPLY (SHRED VARIANT).
  • Shredding applies to top-level VARIANT columns or VARIANT fields in structs, excluding VARIANT data stored inside arrays or maps.