sql query on blob storage

Shambhu Rai 1,401 Reputation points

Hi Expert,

how to write sql query on azure blob storage files using databricks notebook without using azure synapse

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,272 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,115 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,845 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,069 questions
Azure Data Lake Analytics
{count} votes

2 answers

Sort by: Most helpful
  1. Sander van de Velde 26,806 Reputation points MVP

    Hello @Shambhu Rai ,

    To connect to Azure Blob Storage check the documentation.

    Use the fully qualified ABFS URI to access data secured with Unity Catalog.

    A Python example from that page:

    spark.sql("SELECT * FROM parquet.`abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data`")

    Pay extra attention to the credentials needed, shown at the bottom of that documentation.

    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.

  2. PRADEEPCHEEKATLA-MSFT 73,651 Reputation points Microsoft Employee

    @Shambhu Rai - Thanks for the question and using MS Q&A platform.

    To write SQL queries on Azure Blob Storage files using Databricks notebook, you can follow the steps below:

    Step1: Create an Azure Databricks workspace, cluster, and notebook.

    **Step2:**Mount the Azure Blob Storage container to the Databricks file system. You can use the following code snippet to mount the container:

    Replace <application-id>, <application-secret>, <tenant-id>, <container-name>, <storage-account-name>, and <mount-name> with your own values.

    configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": "<application-id>",
               "fs.azure.account.oauth2.client.secret": "<application-secret>",
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}

    Step3: Read the files from the mounted directory using the spark.read function. You can use the following code snippet to read a CSV file:

    Replace <mount-name> and <file-name> with your own values.

    df = spark.read.format("csv").option("header", "true").load("/mnt/<mount-name>/<file-name>.csv")

    Step4: Run SQL queries on the DataFrame using the spark.sql function. You can use the following code snippet to run a SQL query:

    Replace <table-name> and <condition> with your own values.

    result = spark.sql("SELECT * FROM <table-name> WHERE <condition>")

    For more details, refer to Connect to Azure Data Lake Storage Gen2 and Blob Storage.

    Hope this helps. Do let us know if you any further queries.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.