sql query on blob storage

Question

Hi Expert,

how to write sql query on azure blob storage files using databricks notebook without using azure synapse

Answer

To connect to Azure Blob Storage check the documentation.

Use the fully qualified ABFS URI to access data secured with Unity Catalog.

A Python example from that page:

dbutils.fs.ls("abfss://******@storageAccount.dfs.core.windows.net/external-location/path/to/data")

spark.read.format("parquet").load("abfss://******@storageAccount.dfs.core.windows.net/external-location/path/to/data")

spark.sql("SELECT * FROM parquet.`abfss://******@storageAccount.dfs.core.windows.net/external-location/path/to/data`")

Pay extra attention to the credentials needed, shown at the bottom of that documentation.

If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.

Answer

@Shambhu Rai - Thanks for the question and using MS Q&A platform.

To write SQL queries on Azure Blob Storage files using Databricks notebook, you can follow the steps below:

Step1: Create an Azure Databricks workspace, cluster, and notebook.

**Step2:**Mount the Azure Blob Storage container to the Databricks file system. You can use the following code snippet to mount the container:

Replace , , , , , and with your own values.

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "",
           "fs.azure.account.oauth2.client.secret": "",
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com//oauth2/token"}

dbutils.fs.mount(
  source="wasbs://@.blob.core.windows.net",
  mount_point="/mnt/",
  extra_configs=configs)

Step3: Read the files from the mounted directory using the spark.read function. You can use the following code snippet to read a CSV file:

Replace and with your own values.

df = spark.read.format("csv").option("header", "true").load("/mnt//.csv")

Step4: Run SQL queries on the DataFrame using the spark.sql function. You can use the following code snippet to run a SQL query:

Replace and with your own values.

df.createOrReplaceTempView("")
result = spark.sql("SELECT * FROM  WHERE ")

For more details, refer to Connect to Azure Data Lake Storage Gen2 and Blob Storage.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

sql query on blob storage

2 answers

Your answer