Connect to data sources from Azure Databricks

This article provides links to all the different data sources in Azure that can be connected to Azure Databricks. Follow the examples in these links to extract data from the Azure data sources (for example, Azure Blob Storage, Azure Event Hubs, etc.) into an Azure Databricks cluster, and run analytical jobs on them.

Prerequisites

  • You must have an Azure Databricks workspace and a Spark cluster. Follow the instructions at Get started.

Data sources for Azure Databricks

The following list provides the data sources in Azure that you can use with Azure Databricks. For a complete list of data sources that can be used with Azure Databricks, see Data sources for Azure Databricks.

  • Azure SQL database

    This link provides the DataFrame API for connecting to SQL databases using JDBC and how to control the parallelism of reads through the JDBC interface. This topic provides detailed examples using the Scala API, with abbreviated Python and Spark SQL examples at the end.

  • Azure Data Lake Storage

    This link provides examples on how to use the Microsoft Entra ID (formerly Azure Active Directory) service principal to authenticate with Azure Data Lake Storage. It also provides instructions on how to access the data in Azure Data Lake Storage from Azure Databricks.

  • Azure Blob Storage

    This link provides examples on how to directly access Azure Blob Storage from Azure Databricks using access key or the SAS for a given container. The link also provides info on how to access the Azure Blob Storage from Azure Databricks using the RDD API.

  • Azure Event Hubs

    This link provides instructions on how to use the Azure Event Hubs Spark connector from Azure Databricks to access data in Azure Event Hubs.

  • Azure Synapse Analytics

    This link provides instructions on how to query data in Azure Synapse.

Next steps

To learn about sources from where you can import data into Azure Databricks, see Data sources for Azure Databricks.