How to read a blob or multiple blobs from a path at Azure using Spark using SAS Token

Question

How to read a blob or multiple blobs from a path at Azure using Spark using SAS Token

Priya Jauhari 25

I have tried to read blobs from azure using spark, in that case, first I need to add the files in sparkcontext, then I read from sparkcontext itself. But I was not able to read directly from Azure URI. Below is the logic that I have used. Here I wanna skip adding file to spark context and read directly from azure blob url.

String blobUrl = String.format("https://%s.blob.core.windows.net/%s/%s?%s", storageAccountName, containerName, blobPath, sasToken);
spark.sparkContext().addFile(blobUrl);
Dataset

Luis Arias 8,621 Reputation points Volunteer Moderator

2024-01-24T13:37:51.25+00:00

Not relevant.
Priya Jauhari 25 Reputation points

2024-01-25T07:08:13.23+00:00

Hi Luis,

Thanks for your response.

I am using java and apache spark here. Databricks is not used.

So, can you please suggest to read using SAS token using java and apache spark.
Luis Arias 8,621 Reputation points Volunteer Moderator

2024-01-25T07:10:35.7366667+00:00

Thanks for feedback and ignore the comment

Accepted answer

0 additional answers

Your answer

Luis Arias 8,621 Reputation points Volunteer Moderator

2024-01-24T13:37:51.25+00:00

Not relevant.
Priya Jauhari 25 Reputation points

2024-01-25T07:08:13.23+00:00

Hi Luis,

Thanks for your response.

I am using java and apache spark here. Databricks is not used.

So, can you please suggest to read using SAS token using java and apache spark.
Luis Arias 8,621 Reputation points Volunteer Moderator

2024-01-25T07:10:35.7366667+00:00

Thanks for feedback and ignore the comment

Answer 1

Hi Priya Jauhari,

Thank you for posting your query here!

Apologies for the delay in response. Please note that to read a blob file from Azure using Spark, you can use the spark.read.format(“wasbs”).load() method.

You can read multiple files using this example code. Copy the access key from the storage account paste in <access_key>.

Example code:

spark.conf.set("fs.azure.account.key.<storage_account>.blob.core.windows.net","<access_key>")

df = spark.read.format("csv").option("header",True).option("inferSchema",True).load("wasbs://<container>@<storage_account>.blob.core.windows.net/folder/")

Source: https://stackoverflow.com/questions/74679629/reading-a-blob-file-with-spark

Also, this might help: https://koiralo.com/2018/02/12/how-to-data-from-azure-blob-storage-with-apache-spark/

Other reference: https://stackoverflow.com/questions/64493290/how-do-you-read-a-file-from-azure-blob-w-apache-spark-without-databricks-but-wi

Please let us know if you have any further queries. I’m happy to assist you further.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Priya Jauhari 25 Reputation points

2024-01-29T11:07:05.7866667+00:00

Thanks Anand Prakash Yadav! It's working

Share via

How to read a blob or multiple blobs from a path at Azure using Spark using SAS Token

0 additional answers

Your answer