@Shambhu Rai - Thanks for the question and using MS Q&A platform.
To write SQL queries on Azure Blob Storage files using Databricks notebook, you can follow the steps below:
Step1: Create an Azure Databricks workspace, cluster, and notebook.
**Step2:**Mount the Azure Blob Storage container to the Databricks file system. You can use the following code snippet to mount the container:
Replace <application-id>
, <application-secret>
, <tenant-id>
, <container-name>
, <storage-account-name>
, and <mount-name>
with your own values.
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": "<application-secret>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
dbutils.fs.mount(
source="wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point="/mnt/<mount-name>",
extra_configs=configs)
Step3: Read the files from the mounted directory using the spark.read
function. You can use the following code snippet to read a CSV file:
Replace <mount-name>
and <file-name>
with your own values.
df = spark.read.format("csv").option("header", "true").load("/mnt/<mount-name>/<file-name>.csv")
Step4: Run SQL queries on the DataFrame using the spark.sql
function. You can use the following code snippet to run a SQL query:
Replace <table-name>
and <condition>
with your own values.
df.createOrReplaceTempView("<table-name>")
result = spark.sql("SELECT * FROM <table-name> WHERE <condition>")
For more details, refer to Connect to Azure Data Lake Storage Gen2 and Blob Storage.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.