Sample datasets

There are a variety of datasets provided by third parties that you can upload to your Azure Databricks workspace and use. Databricks also provides a variety of datasets that are already mounted to DBFS in your Azure Databricks workspace.

Third-party sample datasets

Azure Databricks has built-in tools to quickly upload third-party sample datasets as comma-separated values (CSV) files into Azure Databricks workspaces. Some popular third-party sample datasets available in CSV format:

Sample dataset To download the sample dataset as a CSV file…
The Squirrel Census On the Data webpage, click Park Data,
Squirrel Data, or Stories.
OWID Dataset Collection In the GitHub repository, click the datasets folder. Click the subfolder that contains the target dataset, and then click the dataset’s CSV file. CSV datasets On the search results webpage, click the target search result, and next to the CSV icon, click Download.
Diamonds (Requires a Kaggle account) On the dataset’s webpage, on the Data tab, on the Data tab, next to diamonds.csv, click the Download icon.
NYC Taxi Trip Duration (Requires a Kaggle account) On the dataset’s webpage, on the Data tab, next to, click the
Download icon. To find the dataset’s CSV files, extracts the contents of the downloaded ZIP file.
UFO Sightings (Requires a account) On the dataset’s webpage, next to
nuforc_reports.csv, click the Download icon.

To use third-party sample datasets in your Azure Databricks workspace, do the following:

  1. Follow the third-party’s instructions to download the dataset as a CSV file to your local machine.
  2. Upload the CSV file from your local machine into your Azure Databricks workspace.
  3. To work with the imported data, use Databricks SQL to query the data. Or you can use a notebook to load the data as a DataFrame.

Databricks datasets (databricks-datasets)

Azure Databricks includes a variety of datasets mounted to DBFS.


The availability and location of Databricks datasets are subject to change without notice.

Browse Databricks datasets

To browse these files in Data Science & Engineering or Databricks Machine Learning from a notebook using Python, Scala, or R you can use Databricks Utilities. The code in this example lists all of the available Databricks datasets.






%fs ls "/databricks-datasets"

Unity Catalog datasets

Unity Catalog provides access to a number of sample datasets in the samples catalog. You can review these datasets in the Data Explorer UI and reference them directly using the <catalog_name>.<database_name>.<table_name> pattern.

The nyctaxi database contains the table trips, which has details about taxi rides in New York City stored using Delta Lake. The following code example returns all records in this table:

SELECT * FROM samples.nyctaxi.trips

The tpch database contains data from the TPC-H Benchmark. To see tables in this database, run:

SHOW TABLES IN samples.tpch

Get information about Databricks datasets

To get more information about a dataset, you can use a local file API to print out the dataset README (if one is available) by using Python, R, or Scala in a notebook in Data Science & Engineering or Databricks Machine Learning, as shown in this code example.


f = open('/dbfs/databricks-datasets/', 'r')

Scala"/dbfs/databricks-datasets/").foreach {



f = read_lines("/dbfs/databricks-datasets/", skip = 0, n_max = -1L)

Create a table based on a Databricks dataset

This code example demonstrates how to use SQL in the Databricks SQL query editor, or how to use Python, Scala, or R in a notebook in Data Science & Engineering or Databricks Machine Learning, to create a table based on a Databricks dataset:


CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/')


spark.sql("CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/')")


spark.sql("CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/')")



sql("CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/')")