Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Volumes are Unity Catalog objects that govern access to non-tabular data. They provide a logical layer over cloud object storage so you can store, organize, and manage files with centralized governance.
For comprehensive documentation on volumes, see What are Unity Catalog volumes?.
Unity Catalog supports two types of volumes:
- Managed volumes: Azure Databricks manages the lifecycle and cloud storage location
- External volumes: You control the cloud storage location and lifecycle
What can you do with Unity Catalog volumes?
You can perform file management operations with volumes using multiple interfaces and tools:
- Upload, download, and browse files in Catalog Explorer. See What is Catalog Explorer?.
- Read and write data programmatically using Apache Spark, pandas, or SQL. See Programmatically work with files in volumes.
- Manage files using
dbutils.fs, magic commands, or bash shell commands. See Utility commands for files in volumes.
You can use volumes with Databricks features that require a file system path. Volumes give you a governed path that works consistently across users and workspaces. For example:
- Data ingestion: Use volumes as the source location for data ingestion. Start from files in a volume and ingest them into tables using:
COPY INTO: Load files from a volume into a table using SQL. SeeCOPY INTO.- Auto Loader: Incrementally ingest new files that arrive in a volume directory into a table. See What is Auto Loader?.
- Spark read APIs: Use Spark read APIs (for example,
spark.read.load) to load files from a volume path into a DataFrame and write them to a table. See Programmatically work with files in volumes. - Databricks UI: Create a table directly from files stored in a volume. See Create a table from data in a volume.
- Compute log delivery: Configure compute log delivery to write logs into a volume path, so log access is governed by Unity Catalog. See Compute log delivery.
- File arrival triggers: Use file arrival triggers to start Lakeflow Jobs when new files arrive in a volume. See Trigger jobs when new files arrive.
- Cluster libraries: Install cluster libraries from a volume (JARs, wheels,
requirements.txt), so library access is governed by Unity Catalog. See Install libraries from a volume. - Init scripts: Store and run cluster-scoped init scripts from a volume, so access to init scripts is governed by Unity Catalog. See Cluster-scoped init scripts.
- ML experiment artifacts: Store ML experiment artifacts (models, metrics, and output files) in a volume so access to your MLflow experiment outputs is governed by Unity Catalog. See Organize training runs with MLflow experiments.