Partitions

Article
11/21/2024

Applies to: check marked yes Databricks SQL Databricks Runtime

A partition is composed of a subset of rows in a table that share the same value for a predefined subset of columns called the partitioning columns. Using partitions can speed up queries against the table as well as data manipulation.

To use partitions, you define the set of partitioning column when you create a table by including the PARTITIONED BY clause.

When inserting or manipulating rows in a table Azure Databricks automatically dispatches rows into the appropriate partitions.

You can also specify the partition directly using a PARTITION clause.

This syntax is also available for tables that don’t use Delta Lake format, to DROP, ADD or RENAME partitions quickly by using the ALTER TABLE statement.

PARTITIONED BY

The PARTITIONED BY clause specified a list of columns along which the new table is partitioned.

Syntax

PARTITIONED BY ( { partition_column [ column_type ] } [, ...] )

Parameters

partition_column

An identifier may reference a column_identifier in the table. If you specify more than one column there must be no duplicates. If you reference all columns in the table’s column_specification an error is raised.
column_type

Unless the partition_column refers to a column_identifier in the table’s column_specification, column_type defines the data type of the partition_column.

Not all data types supported by Databricks SQL are supported by all data sources.

Notes

Unless you define a Delta Lake table partitioning columns referencing the columns in the column specification are always moved to the end of the table.

PARTITION

You use the PARTITION clause to identify a partition to be queried or manipulated.

A partition is identified by naming all its columns and associating each with a value. You need not specify them in a specific order.

Unless you are adding a new partition to an existing table you may omit columns or values to indicate that the operation applies to the all matching partitions matching the subset of columns.

PARTITION ( { partition_column  [ = partition_value | LIKE pattern ] } [ , ... ] )

Parameters

partition_column

A column named as a partition column of the table. You may not specify the same column twice.
= partition_value

A literal of a data type matching the type of the partition column. If you omit a partition value the specification will match all values for this partition column.
LIKE pattern

This form is only allowed in ALTER SHARE ADD TABLE.

Matches the string representation of partition_column to pattern. pattern must be a string literal as used in LIKE.

Examples

SQL

-- Use the PARTTIONED BY clause in a table definition
> CREATE TABLE student(university STRING,
                       major      STRING,
                       name       STRING)
         PARTITIONED BY(university, major)

> CREATE TABLE professor(name STRING)
         PARTITIONED BY(university STRING,
                        department STRING);

-- Use the PARTITION specification to INSERT into a table
> INSERT INTO student
         PARTITION(university= 'TU Kaiserslautern') (major, name)
         SELECT major, name FROM freshmen;

-- Use the partition specification to add and drop a partition
> CREATE TABLE log(date DATE, id INT, event STRING)
     USING CSV
     PARTITIONED BY (date);

> ALTER TABLE log ADD PARTITION(date = DATE'2021-09-10');

> ALTER TABLE log DROP PARTITION(date = DATE'2021-09-10');

-- Drop all partitions from the named university, independent of the major.
> ALTER TABLE student DROP PARTITION(university = 'TU Kaiserslautern');

Additional resources

Documentation

ALTER TABLE … PARTITION - Azure Databricks - Databricks SQL

Learn how to use the ALTER TABLE … PARTITION syntax of the SQL language in Databricks SQL and Databricks Runtime.
SHOW PARTITIONS - Azure Databricks - Databricks SQL

Learn how to use the SHOW PARTITIONS syntax of the SQL language in Databricks SQL and Databricks Runtime.
INSERT - Azure Databricks - Databricks SQL

Learn how to use the INSERT syntax of the SQL language in Databricks SQL and Databricks Runtime.
When to partition tables on Azure Databricks - Azure Databricks

Learn when and how to create partitions when using Delta Lake on Azure Databricks.
ANALYZE TABLE - Azure Databricks - Databricks SQL

Learn how to use the ANALYZE TABLE … STATISTICS syntax of the SQL language in Databricks SQL and Databricks Runtime.
Hints - Azure Databricks - Databricks SQL

Learn how to use hints syntax of the SQL language in Databricks SQL and Databricks Runtime
DELETE FROM - Azure Databricks - Databricks SQL

Learn how to use the DELETE FROM syntax of the Delta Lake SQL language in Databricks SQL and Databricks Runtime.
CREATE TABLE [USING] - Azure Databricks - Databricks SQL

Learn how to use the CREATE TABLE \[USING] syntax of the SQL language in Databricks SQL and Databricks Runtime.

Training

Module

Work with Delta Lake tables in Microsoft Fabric - Training

Tables in a Microsoft Fabric lakehouse are based on the Delta Lake technology commonly used in Apache Spark. By using the enhanced capabilities of delta tables, you can create advanced analytics solutions.

Certification

Microsoft Certified: Azure Data Engineer Associate - Certifications

Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.

Share via

Partitions

PARTITIONED BY

Syntax

Parameters

Notes

PARTITION

Parameters

Examples

Feedback

Additional resources