sql query on blob storage

Shambhu Rai 1,411 Reputation points
2023-07-27T20:32:12.95+00:00

Hi Expert,

how to write sql query on azure blob storage files using databricks notebook without using azure synapse

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,489 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,000 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,219 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,854 questions
Azure Data Lake Analytics
{count} votes

2 answers

Sort by: Most helpful
  1. Sander van de Velde | MVP 33,136 Reputation points MVP
    2023-07-27T21:08:35.7566667+00:00

    Hello @Shambhu Rai ,

    To connect to Azure Blob Storage check the documentation.

    Use the fully qualified ABFS URI to access data secured with Unity Catalog.

    A Python example from that page:

    dbutils.fs.ls("abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data")
    
    spark.read.format("parquet").load("abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data")
    
    spark.sql("SELECT * FROM parquet.`abfss://container@storageAccount.dfs.core.windows.net/external-location/path/to/data`")
    
    

    Pay extra attention to the credentials needed, shown at the bottom of that documentation.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


  2. PRADEEPCHEEKATLA 90,231 Reputation points
    2023-07-31T07:07:16.4+00:00

    @Shambhu Rai - Thanks for the question and using MS Q&A platform.

    To write SQL queries on Azure Blob Storage files using Databricks notebook, you can follow the steps below:

    Step1: Create an Azure Databricks workspace, cluster, and notebook.

    **Step2:**Mount the Azure Blob Storage container to the Databricks file system. You can use the following code snippet to mount the container:

    Replace <application-id>, <application-secret>, <tenant-id>, <container-name>, <storage-account-name>, and <mount-name> with your own values.

    configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": "<application-id>",
               "fs.azure.account.oauth2.client.secret": "<application-secret>",
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
    
    dbutils.fs.mount(
      source="wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
      mount_point="/mnt/<mount-name>",
      extra_configs=configs)
    

    Step3: Read the files from the mounted directory using the spark.read function. You can use the following code snippet to read a CSV file:

    Replace <mount-name> and <file-name> with your own values.

    df = spark.read.format("csv").option("header", "true").load("/mnt/<mount-name>/<file-name>.csv")
    

    Step4: Run SQL queries on the DataFrame using the spark.sql function. You can use the following code snippet to run a SQL query:

    Replace <table-name> and <condition> with your own values.

    df.createOrReplaceTempView("<table-name>")
    result = spark.sql("SELECT * FROM <table-name> WHERE <condition>")
    

    For more details, refer to Connect to Azure Data Lake Storage Gen2 and Blob Storage.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.