Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Use the REORG command when you need to rewrite part of a Delta table for a specific maintenance goal. In Fabric, REORG reorganizes the physical layout of a table by rewriting data files or updating table metadata, depending on the option you choose.
REORG is different from routine file compaction. It targets scenarios such as physically removing rows that deletion vectors marked as deleted.
What REORG does
REORG TABLE rewrites table state for a defined maintenance operation.
APPLY (PURGE)rewrites affected data files so rows that were soft-deleted through deletion vectors are physically removed from the underlying Parquet files.
Use REORG when you need a specific rewrite outcome, not just smaller or fewer files.
Use PURGE to physically remove soft-deleted rows
Deletion vectors let Delta Lake mark rows as deleted without rewriting the original Parquet files right away. That behavior keeps delete operations efficient, but the deleted row data still exists in the physical files until you rewrite them.
When you run REORG TABLE ... APPLY (PURGE), Fabric rewrites the affected files and removes the soft-deleted rows from the active Parquet files. After PURGE finishes, those deleted rows are actually gone from the rewritten files.
Note
PURGE is not typically required as a separate maintenance step. OPTIMIZE automatically purges files where greater than 5% of records are referenced by deletion vectors during compaction. Reserve PURGE for scenarios where you need explicit control over when soft-deleted rows are physically removed.
This option is useful when:
- You have compliance or GDPR requirements and need deleted data to be physically removed on a specific schedule.
- You want to force-purge files that fall below the 5% threshold that
OPTIMIZEuses.
Review the syntax
Use the following examples in a Fabric notebook.
Purge soft-deleted rows from a table
Purge soft-deleted rows for matching data only
Use a WHERE clause when you want to target specific partitions or a smaller slice of data.
Choose REORG or OPTIMIZE
REORG and OPTIMIZE solve different problems.
OPTIMIZEperforms bin compaction. It consolidates small files into larger files to improve scan efficiency.REORG ... APPLY (PURGE)physically removes rows that deletion vectors marked as deleted.
You can use both commands together. For example, you might run REORG ... APPLY (PURGE) to physically remove deleted data, then run OPTIMIZE to improve file layout.
Understand deletion vectors
Deletion vectors are metadata structures that mark rows as deleted without immediately rewriting the Parquet files that contain those rows. Readers honor the deletion vectors, so the deleted rows don't appear in query results, but the bytes still remain in storage until a rewrite happens.
REORG ... APPLY (PURGE) is the step that physically removes those rows from the rewritten Parquet files.
Run REORG in Fabric
Run REORG from Spark-based experiences in Fabric, such as:
- Fabric notebooks
- Spark job definitions
Don't run REORG from the SQL analytics endpoint. REORG is a Spark SQL maintenance command for Delta tables.
Follow best practices
- Run
PURGEonly when you have a specific need, such as compliance requirements.OPTIMIZEautomatically purges files where greater than 5% of records are referenced by deletion vectors, so routine maintenance usually doesn't require a separatePURGEstep. - Run
PURGEbeforeVACUUMwhen you need deleted data to be physically removed for compliance or GDPR requirements. - Use
WHEREpredicates to target specific partitions when you don't need to rewrite the full table. - Use
UPGRADE UNIFORMwhen you need Iceberg readers, but plan for the extra metadata that UniForm maintains. - Treat
REORGandOPTIMIZEas complementary maintenance operations, not interchangeable ones.