How to read a blob or multiple blobs from a path at Azure using Spark using SAS Token

Priya Jauhari 25 Reputation points
2024-01-24T12:06:11.3733333+00:00

I have tried to read blobs from azure using spark, in that case, first I need to add the files in sparkcontext, then I read from sparkcontext itself. But I was not able to read directly from Azure URI. Below is the logic that I have used. Here I wanna skip adding file to spark context and read directly from azure blob url.

String blobUrl = String.format("https://%s.blob.core.windows.net/%s/%s?%s", storageAccountName, containerName, blobPath, sasToken);
spark.sparkContext().addFile(blobUrl);
Dataset
Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,529 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,192 questions
{count} votes

Accepted answer
  1. Anand Prakash Yadav 7,855 Reputation points Microsoft External Staff
    2024-01-29T08:16:41.9533333+00:00

    Hi Priya Jauhari,

    Thank you for posting your query here!

    Apologies for the delay in response. Please note that to read a blob file from Azure using Spark, you can use the spark.read.format(“wasbs”).load() method.  

    You can read multiple files using this example code. Copy the access key from the storage account paste in <access_key>.

    Example code:

    spark.conf.set("fs.azure.account.key.<storage_account>.blob.core.windows.net","<access_key>")
    
    df = spark.read.format("csv").option("header",True).option("inferSchema",True).load("wasbs://<container>@<storage_account>.blob.core.windows.net/folder/")
    

    Source: https://stackoverflow.com/questions/74679629/reading-a-blob-file-with-spark

    Also, this might help: https://koiralo.com/2018/02/12/how-to-data-from-azure-blob-storage-with-apache-spark/

    Other reference: https://stackoverflow.com/questions/64493290/how-do-you-read-a-file-from-azure-blob-w-apache-spark-without-databricks-but-wi

    Please let us know if you have any further queries. I’m happy to assist you further.  

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members. 

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.