Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Hive-style partitioning divides a Delta table into physical subdirectories based on the values of one or more partition columns. Each unique combination of partition column values creates a separate directory. This layout enables partition pruning: the engine skips entire directories when a query filters on the partition column.
Tip
For most workloads starting in Fabric Runtime 2.0, liquid clustering is the recommended data layout strategy for read performance. The primary reason to use partitioning is to enable concurrent write operations that don't conflict.
For full liquid clustering guidance, see Liquid clustering.
When to use partitioning
The primary use case for partitioning in Delta Lake is enabling concurrent write operations that don't conflict. Delta Lake uses optimistic concurrency control, and two operations that touch the same files can conflict. Partitioning makes it possible for concurrent operations to target disjoint sets of files by operating on separate partitions.
Use partitioning when:
- You have concurrent writers that need to update, delete, or merge into the same table without conflicting—for example, multiple pipelines processing different business units or regions simultaneously.
- Your partition column has low to moderate cardinality (tens to hundreds of distinct values, not thousands). Note: larger tables can accommodate more partitions. Target at least 1 GB of data in each partition.
- Partition values align with your write patterns—each writer naturally targets a specific partition.
Important
For file skipping and read performance alone, liquid clustering is more effective than partitioning. Liquid clustering eliminates the risk of small-file issues from high-cardinality partition columns and lets you change clustering strategy over the table's lifecycle. Choose partitioning primarily when you need to isolate concurrent writers.
Create a partitioned table
CREATE TABLE sales.orders (
order_id BIGINT,
order_date DATE,
region STRING,
amount DECIMAL(10,2)
)
USING DELTA
PARTITIONED BY (region)
Partitioning and concurrent writes
Partitioning is the primary mechanism in Delta Lake for avoiding conflicts between concurrent write operations. When a table is partitioned, operations that target different partitions operate on disjoint sets of files and don't conflict with each other.
For example, two concurrent MERGE INTO operations on a table partitioned by region don't conflict as long as each targets a different region—provided the partition column is explicitly included in the merge condition:
-- Pipeline A: processes North America only
MERGE INTO sales.orders AS target
USING staged_orders AS source
ON target.order_id = source.order_id
AND target.region = 'NA'
AND source.region = 'NA'
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
Without partitioning—or without including the partition column in the operation condition—these same operations could conflict even if they logically modify different rows. The partition column must appear in the merge condition itself, not just in the source data. Without it, Delta Lake can't determine at validation time that the two operations touched disjoint file sets.
For a complete guide to conflict types and resolution strategies, see Concurrency control.
Common pitfalls
- High cardinality partition columns (for example,
user_idwith millions of values) create thousands of tiny directories and files, which degrades both write and read performance.- Date columns should be chosen with caution. For many tables, partitioning by a date column results in too many small partitions. Target at least 1 GB of data in each partition.
- Partition columns can't be changed after table creation without rewriting the entire table.
- Small file problem is common with streaming or frequent appends into many partitions, because each write creates at least one file per partition.
- Partitioning and liquid clustering are incompatible on the same table. You must choose one strategy.
Compare partitioning and liquid clustering
| Aspect | Hive-style partitioning | Liquid clustering |
|---|---|---|
| Best for | Concurrent writer isolation | General-purpose file skipping and read optimization |
| Granularity | One directory per distinct value (or combination) | File-level value ranges, no directories |
| High cardinality | Creates thousands of small files/directories | Handles naturally; bins data into right-sized files |
| Column changes | Requires full table rewrite | ALTER TABLE CLUSTER BY applies on next OPTIMIZE |
| Write path | Partition column must be known at write time | Any column can be clustered after the fact |
| Concurrent writes | Disjoint partitions avoid conflicts | Append-only without conflicts; updates/deletes/merges can conflict on unpartitioned tables |
| Small file problem | Common with streaming or frequent inserts | Managed by OPTIMIZE compaction |