Κοινή χρήση μέσω


Explore storage and find data files

This article focuses on discovering and exploring directories and data files managed with Unity Catalog volumes, including UI-based instructions for exploring volumes with Catalog Explorer. This article also provides examples for programmatic exploration of data in cloud object storage using volume paths and cloud URIs.

Databricks recommends using volumes to manage access to data in cloud object storage. For more information on connecting to data in cloud object storage, see Connect to data sources.

For a full walkthrough of how to interact with files in all locations, see Work with files on Azure Databricks.

Important

When searching for Files in the workspace UI, you might discover data files stored as workspace files. Databricks recommends using workspace files primarily for code (such as scripts and libraries), init scripts, or configuration files. You should ideally limit data stored as workspace files to small datasets that might be used for tasks such as testing during development and QA. See What are workspace files?.

Volumes vs. legacy cloud object configurations

When you use volumes to manage access to data in cloud object storage, you can only use the volumes path to access data, and these paths are available with all Unity Catalog-enabled compute. You cannot register data files backing Unity Catalog tables using volumes. Databricks recommends using table names instead of file paths to interact with structured data registered as Unity Catalog tables. See How do paths work for data managed by Unity Catalog?.

If you use a legacy method for configuring access to data in cloud object storage, Azure Databricks reverts to legacy table ACLs permissions. Users wishing to access data using cloud URIs from SQL warehouses or compute configured with shared access mode require the ANY FILE permission. See Hive metastore table access control (legacy).

Azure Databricks provides several APIs for listing files in cloud object storage. Most examples in this article focus on using volumes. For examples on interacting with data on object storage configured without volumes, see List files with URIs.

Explore volumes

You can use Catalog Explorer to explore data in volumes and review the details of a volume. You are only able to see volumes that you have permissions to read, so you can query all data discovered this way.

You can use SQL to explore volumes and their metadata. To list files in volumes, you can use SQL, the %fs magic command, or Databricks utilities. When interacting with data in volumes, you use the path provided by Unity Catalog, which always has the following format:

/Volumes/catalog_name/schema_name/volume_name/path/to/data

Display volumes

SQL

Run the following command to see a list of volumes in a given schema.

SHOW VOLUMES IN catalog_name.schema_name;

See SHOW VOLUMES.

Catalog Explorer

To display volumes in a given schema with Catalog Explorer, do the following:

  1. Select the Catalog icon Catalog icon.
  2. Select a catalog.
  3. Select a schema.
  4. Click Volumes to expand all volumes in the schema.

Note

If no volumes are registered to a schema, the Volumes option is not displayed. Instead, you see a list of available tables.

See volume details

SQL

Run the following command to describe a volume.

DESCRIBE VOLUME volume_name

See DESCRIBE VOLUME.

Catalog Explorer

Click the volume name and select the Details tab to review volume details.

See files in volumes

SQL

Run the following command to list the files in a volume.

LIST '/Volumes/catalog_name/schema_name/volume_name/'

Catalog Explorer

Click the volume name and select the Details tab to review volume details.

%fs

Run the following command to list the files in a volume.

%fs ls /Volumes/catalog_name/schema_name/volume_name/

Databricks utilities

Run the following command to list the files in a volume.

dbutils.fs.ls("/Volumes/catalog_name/schema_name/volume_name/")

List files with URIs

You can query cloud object storage configured with methods other than volumes using URIs. You must be connected to compute with privileges to access the cloud location. The ANY FILE permission is required on SQL warehouses and compute configured with shared access mode.

Note

URI access to object storage configured with volumes is not supported. You cannot use Catalog Explorer to review contents of object storage not configured with volumes.

The following examples include example URIs for data stored with Azure Data Lake Storage Gen2, S3, and GCS.

SQL

Run the following command to list files in cloud object storage.

-- ADLS 2
LIST 'abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/data'

-- S3
LIST 's3://bucket-name/path/to/data'

-- GCS
LIST 'gs://bucket-name/path/to/data'

%fs

Run the following command to list files in cloud object storage.

# ADLS 2
%fs ls abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/data

# S3
%fs ls s3://bucket-name/path/to/data

# GCS
%fs ls gs://bucket-name/path/to/data

Databricks utilities

Run the following command to list files in cloud object storage.


# ADLS 2
dbutils.fs.ls("abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/data")

# S3
dbutils.fs.ls("s3://bucket-name/path/to/data")

# GCS
dbutils.fs.ls("bucket-name/path/to/data")