sql query on blob storage

Shambhu Rai 1,406 Reputation points
2023-07-27T20:32:12.95+00:00

Hi Expert,

how to write sql query on azure blob storage files using databricks notebook without using azure synapse

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,346 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,382 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,926 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,570 questions
Azure Data Lake Analytics
{count} votes

2 answers

Sort by: Most helpful
  1. Sander van de Velde 28,546 Reputation points MVP
    2023-07-27T21:08:35.7566667+00:00

    Hello @Shambhu Rai ,

    To connect to Azure Blob Storage check the documentation.

    Use the fully qualified ABFS URI to access data secured with Unity Catalog.

    A Python example from that page:

    dbutils.fs.ls("abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data")
    
    spark.read.format("parquet").load("abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data")
    
    spark.sql("SELECT * FROM parquet.`abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data`")
    
    

    Pay extra attention to the credentials needed, shown at the bottom of that documentation.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


  2. PRADEEPCHEEKATLA-MSFT 77,336 Reputation points Microsoft Employee
    2023-07-31T07:07:16.4+00:00

    @Shambhu Rai - Thanks for the question and using MS Q&A platform.

    To write SQL queries on Azure Blob Storage files using Databricks notebook, you can follow the steps below:

    Step1: Create an Azure Databricks workspace, cluster, and notebook.

    **Step2:**Mount the Azure Blob Storage container to the Databricks file system. You can use the following code snippet to mount the container:

    Replace <application-id>, <application-secret>, <tenant-id>, <container-name>, <storage-account-name>, and <mount-name> with your own values.

    configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": "<application-id>",
               "fs.azure.account.oauth2.client.secret": "<application-secret>",
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
    
    dbutils.fs.mount(
      source="wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
      mount_point="/mnt/<mount-name>",
      extra_configs=configs)
    

    Step3: Read the files from the mounted directory using the spark.read function. You can use the following code snippet to read a CSV file:

    Replace <mount-name> and <file-name> with your own values.

    df = spark.read.format("csv").option("header", "true").load("/mnt/<mount-name>/<file-name>.csv")
    

    Step4: Run SQL queries on the DataFrame using the spark.sql function. You can use the following code snippet to run a SQL query:

    Replace <table-name> and <condition> with your own values.

    df.createOrReplaceTempView("<table-name>")
    result = spark.sql("SELECT * FROM <table-name> WHERE <condition>")
    

    For more details, refer to Connect to Azure Data Lake Storage Gen2 and Blob Storage.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.