Share via


ANALYZE TABLE

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Use the ANALYZE TABLE statement to:

  1. Collect estimated statistics, or
  2. Compute storage metrics about a specific table or all tables in a specified schema.

Estimated statistics are used by the query optimizer to generate an optimal query plan. Predictive optimization automatically runs ANALYZE on Unity Catalog managed tables to collect this information. Databricks recommends enabling predictive optimization for all Unity Catalog managed tables to simplify data maintenance and reduce storage costs. See Predictive optimization for Unity Catalog managed tables.

Separately, run COMPUTE STORAGE METRICS on a table to return total storage size.

Syntax

ANALYZE TABLE table_name [ PARTITION clause ]
    COMPUTE [ DELTA ] STATISTICS [ NOSCAN | FOR COLUMNS col1 [, ...] | FOR ALL COLUMNS ]

ANALYZE TABLES [ { FROM | IN } schema_name ] COMPUTE STATISTICS [ NOSCAN ]

ANALYZE TABLE table_name COMPUTE STORAGE METRICS

Parameters

  • table_name

    Identifies the table to be analyzed. The name must not include a temporal specification or options specification or path. If the table cannot be found Azure Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

  • PARTITION clause

    Optionally limits the command to a subset of partitions.

    This clause is not supported for Delta Lake tables.

  • DELTA

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 14.3 LTS and above

    Recomputes statistics stored in the Delta log for the columns configured for statistics collection in a Delta table.

    When the DELTA keyword is specified, normal statistics for the query optimizer are not collected.

    Databricks recommends running ANALYZE TABLE table_name COMPUTE DELTA STATISTICS after setting new columns for data skipping to update statistics for all rows in a table. For optimized performance, run ANALYZE TABLE table_name COMPUTE STATISTICS to update the query plan after the Delta log update completes.

  • [ NOSCAN | FOR COLUMNS col [, …] | FOR ALL COLUMNS ]

    If no analyze option is specified, ANALYZE TABLE collects the table's number of rows and size in bytes.

    • NOSCAN

      Collect only the table's size in bytes (which does not require scanning the entire table).

    • FOR COLUMNS col [, …] | FOR ALL COLUMNS

      Collect column statistics for each column specified, or alternatively for every column, as well as table statistics.

      Column statistics are not supported in combination with the PARTITION clause.

  • { FROM | IN } schema_name

    Specifies the name of the schema to be analyzed. Without a schema name, ANALYZE TABLES collects all tables in the current schema that the current user has permission to analyze.

COMPUTE STORAGE METRICS

Applies to: check marked yes Databricks Runtime 18.0 and above

The ANALYZE TABLE … COMPUTE STORAGE METRICS command computes total storage size metrics for a table. Unlike ANALYZE TABLE … COMPUTE STATISTICS which optimizes query performance, this command provides detailed storage breakdown for cost analysis and optimization.

Description

Computes total storage size metrics for a specific table. This command returns comprehensive storage information including total bytes, active bytes, vacuumable bytes, and time travel bytes, along with the associated number of files for each category.

Use this command to identify large or unused tables, optimize storage costs, and understand why the total storage size differs from the active table size. This is particularly useful for platform administrators who need to analyze storage patterns across multiple tables or track storage changes over time.

Output metrics

The command returns the following metrics in a single row:

Field Description
total_bytes Total storage size for the table. This equals delta log size + active bytes + vacuumable bytes + time travel bytes.
total_files Total number of files, including delta log files, active files, vacuumable files, and time travel files.
active_bytes Size of data files actively referenced by the table (same as sizeInBytes).
active_files Total number of files actively referenced by the table.
vacuumable_bytes Size of data that can be removed by running VACUUM or enabling predictive optimization.
vacuumable_files Number of vacuumable files.
time_travel_bytes Size of historical data used for rollbacks and time travel operations. Also known as tombstoned bytes or failsafe bytes.
time_travel_files Number of files used for time travel.

Details

  • The command uses a recursive list approach to compute storage information. Execution time is typically within minutes, but may take up to several hours for very large tables.
  • This command works for both Unity Catalog managed and external tables.
  • The storage metrics are computed at the time the command is run and aren't stored in Unity Catalog or returned by DESCRIBE EXTENDED.
  • To track storage changes over time, run this command periodically and store the results in a table. Run this command in a loop across multiple tables to analyze storage patterns across your data estate.

Table type considerations

  • Materialized views and streaming tables: total_bytes includes the size of the table and associated metadata. The active_bytes metric excludes the vacuumable and time-travelable portions of the table.
  • Shallow clones: total_bytes includes only the clone's own metadata and delta log files, excluding source table files. active_bytes equals zero since the clone references the source table's data files.
  • Converted tables: Tables that have been recently converted from external to managed may include data across both managed and external locations. For example, data remains in the external location during the rollback window. See Convert an external table to a managed Unity Catalog table.

Examples

COMPUTE STATISTICS examples

> CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id);
> INSERT INTO students PARTITION (student_id = 111111) VALUES ('Mark');
> INSERT INTO students PARTITION (student_id = 222222) VALUES ('John');

> ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;

> DESC EXTENDED students;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           student_id                  int    null
                  ...                  ...     ...
           Statistics            864 bytes
                  ...                  ...     ...

> ANALYZE TABLE students COMPUTE STATISTICS;

> DESC EXTENDED students;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           student_id                  int    null
                  ...                  ...     ...
           Statistics    864 bytes, 2 rows
                  ...                  ...     ...

-- Note: ANALYZE TABLE .. PARTITION is not supported for Delta tables.
> ANALYZE TABLE students PARTITION (student_id = 111111) COMPUTE STATISTICS;

> DESC EXTENDED students PARTITION (student_id = 111111);
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           student_id                  int    null
                  ...                  ...     ...
 Partition Statistics    432 bytes, 1 rows
                  ...                  ...     ...
         OutputFormat org.apache.hadoop...

> ANALYZE TABLE students COMPUTE STATISTICS FOR COLUMNS name;

> DESC EXTENDED students name;
      info_name info_value
 -------------- ----------
       col_name       name
      data_type     string
        comment       NULL
            min       NULL
            max       NULL
      num_nulls          0
 distinct_count          2
    avg_col_len          4
    max_col_len          4
      histogram       NULL

> ANALYZE TABLES IN school_schema COMPUTE STATISTICS NOSCAN;
> DESC EXTENDED teachers;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           teacher_id                  int    null
                  ...                  ...     ...
           Statistics           1382 bytes
                  ...                  ...     ...

> DESC EXTENDED students;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           student_id                  int    null
                  ...                  ...     ...
           Statistics            864 bytes
                  ...                  ...     ...

> ANALYZE TABLES COMPUTE STATISTICS;
> DESC EXTENDED teachers;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           teacher_id                  int    null
                  ...                  ...     ...
           Statistics   1382 bytes, 2 rows
                  ...                  ...     ...

> DESC EXTENDED students;
             col_name            data_type comment
 -------------------- -------------------- -------
                 name               string    null
           student_id                  int    null
                  ...                  ...     ...
           Statistics    864 bytes, 2 rows
                  ...                  ...     ...

> ANALYZE TABLE some_delta_table COMPUTE DELTA STATISTICS;

COMPUTE STORAGE METRICS examples

> ANALYZE TABLE main.my_schema.my_table COMPUTE STORAGE METRICS;
total_bytes  total_files  active_bytes  active_files  vacuumable_bytes  vacuumable_files  time_travel_bytes  time_travel_files
-----------  -----------  ------------  ------------  ----------------  ----------------  -----------------  -----------------
 5368709120         1250    4294967296          1000        805306368               150         268435456                100

The output shows:

  • Total storage: 5.37 GB across 1,250 files
  • Active data: 4.29 GB in 1,000 files (current table version)
  • Vacuumable data: 805 MB in 150 files (can be reclaimed with VACUUM)
  • Time travel data: 268 MB in 100 files (for historical queries)