SHOW STATISTICS

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 18.3 and above

Returns optimizer statistics for a table as a JSON document, including table-level statistics and optionally per-column statistics.

Databricks recommends SHOW STATISTICS for reading optimizer statistics programmatically. It returns only statistics, without the schema, storage, properties, partitions, and other metadata included in DESCRIBE TABLE EXTENDED ... AS JSON. Use ANALYZE TABLE ... COMPUTE STATISTICS (or predictive optimization) to collect the statistics that this command reports.

Required permissions

To run SHOW STATISTICS on a Unity Catalog table, you need:

  • USE CATALOG on the parent catalog.
  • USE SCHEMA on the parent schema.
  • SELECT on the table, or ownership of the table.

Syntax

SHOW STATISTICS [ { FROM | IN } ] table_name
    [ FOR COLUMNS column_name [, ...] | FOR ALL COLUMNS ]
    AS JSON

The FROM and IN keywords are optional and interchangeable.

Parameters

  • table_name

    Identifies the table to report statistics for. The name must not include a temporal specification or options specification.

    If the table cannot be found, Azure Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

    Views (temporary or persistent) are not supported. Invoking SHOW STATISTICS on a view raises EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE. Use DESCRIBE EXTENDED for view metadata.

  • FOR COLUMNS column_name [, ...]

    Adds an entry to statistics.column_statistics for each named column.

    If any named column does not exist on the table, the command raises COLUMN_NOT_FOUND.

  • FOR ALL COLUMNS

    Adds an entry to statistics.column_statistics for every atomic-typed top-level column of the table.

    Columns whose data type is STRUCT, ARRAY, or MAP are excluded, because the optimizer does not collect statistics on complex types. Nested fields are not enumerated.

  • AS JSON

    Required. The result is returned as a single JSON-formatted STRING.

JSON-formatted output

The returned JSON document uses the following schema:

{
  "table_name": "<table_name>",
  "catalog_name": "<catalog_name>",
  "namespace": ["<schema_name>"],
  "schema_name": "<schema_name>",
  "statistics": {
    "size_in_bytes": <bytes>,
    "num_rows": <count>,
    "collection_source": "<collection_source>",
    "created_at": "<created_at_timestamp>",
    "column_statistics": {
      "<column_name>": {
        "min": "<min_value>",
        "max": "<max_value>",
        "distinct_count": <count>,
        "num_nulls": <count>,
        "avg_len": <bytes>,
        "max_len": <bytes>,
        "has_histogram": <boolean>,
        "has_mcv": <boolean>,
        "has_kll_sketch": <boolean>
      }
    }
  }
}
  • statistics.size_in_bytes

    Total size of the table data in bytes, as known to the optimizer. Omitted when no table-level statistics exist.

  • statistics.num_rows

    Row count, as known to the optimizer. null if not collected.

  • statistics.collection_source

    How the table-level statistics were produced. One of:

    • Manual Analyze — collected by ANALYZE TABLE ... COMPUTE STATISTICS.
    • Auto Stats — collected automatically during a write.
    • Predictive Analyze — collected by predictive optimization.
    • Unknown — collected by an unrecognized source.
    • null — no table-level statistics are available.
  • statistics.created_at

    When the table-level statistics were collected, as an ISO 8601 UTC timestamp (for example, 2026-05-29T08:14:24.845Z). null if not collected.

  • statistics.column_statistics

    Included only when FOR COLUMNS or FOR ALL COLUMNS is specified. Maps each requested column to its per-column statistics.

    The following fields are always present, with null when not collected:

    • min, max — minimum and maximum values, rendered as JSON strings.
    • distinct_count — number of distinct values (NDV).
    • num_nulls — number of NULL values.
    • avg_len, max_len — average and maximum value length in bytes.
    • has_histogramtrue if an equi-height histogram is available for the column.

    The following fields appear only when the corresponding sketch collection is enabled, and are omitted otherwise:

    • has_mcvtrue if a most-common-values list is available for the column.
    • has_kll_sketchtrue if a KLL sketch is available for the column.

Examples

-- Set up a table and collect column statistics.
> CREATE TABLE customer(cust_id INT, name STRING, state STRING) USING parquet;
> INSERT INTO customer VALUES (100, 'Mike', 'AR'), (200, 'Jane', 'CA');
> ANALYZE TABLE customer COMPUTE STATISTICS FOR ALL COLUMNS;

-- Table-level statistics only.
> SHOW STATISTICS FROM customer AS JSON;
 {"table_name":"customer","catalog_name":"spark_catalog",
  "namespace":["default"],"schema_name":"default",
  "statistics":{"size_in_bytes":864,"num_rows":2,
                "collection_source":"Manual Analyze",
                "created_at":"2026-05-29T08:14:24.845Z"}}

-- Per-column statistics for a list of columns.
> SHOW STATISTICS FROM customer FOR COLUMNS cust_id, name AS JSON;
 {"table_name":"customer","catalog_name":"spark_catalog",
  "namespace":["default"],"schema_name":"default",
  "statistics":{"size_in_bytes":864,"num_rows":2,
                "collection_source":"Manual Analyze",
                "created_at":"2026-05-29T08:14:24.845Z",
                "column_statistics":{
                  "cust_id":{"min":"100","max":"200","distinct_count":2,"num_nulls":0,
                             "avg_len":4,"max_len":4,"has_histogram":false,"has_mcv":false},
                  "name":   {"min":"Jane","max":"Mike","distinct_count":2,"num_nulls":0,
                             "avg_len":4,"max_len":4,"has_histogram":false,"has_mcv":false}}}}

-- All atomic top-level columns. STRUCT, ARRAY, and MAP columns are skipped.
> SHOW STATISTICS FROM customer FOR ALL COLUMNS AS JSON;