This section describes the Apache Spark data sources you can use in Azure Databricks. Many include a notebook that demonstrates how to use the data source to read and write data.
The following data sources are either directly supported in Databricks Runtime or require simple shell commands to enable access:
- Avro file
- Binary file
- CSV file
- Hive table
- JSON file
- LZO compressed file
- MLflow experiment
- Parquet file
- XML file
- Zip files
In addition, Azure Databricks supports Delta Lake and makes it easy to create Delta tables from multiple data formats.
To learn how to access metadata for file-based data sources, see File metadata column.
The following storage data sources require you to configure the connection to storage. Some also require that you create an Azure Databricks library and install it in a cluster:
- Accessing Azure Data Lake Storage Gen2 and Blob Storage with Azure Databricks
- Accessing Azure Data Lake Storage Gen1 from Azure Databricks
- Azure Cosmos DB
- Azure Synapse Analytics
- SQL databases using JDBC
- SQL Databases using the Apache Spark connector
Submit and view feedback for