Read parquet file from a blob storage

Keerthana J 71 Reputation points
2024-02-01T10:32:35.7933333+00:00

How can I read a snappy.parquet file which is in my blob container(already mounted) from azure databricks?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2024-02-02T19:47:45.57+00:00

    Hello @KEERTHANA JAYADEVAN

    You can use the spark.read.parquet() method to read the Parquet file from a mounted blob container in Azure Databricks.

    Here is an example:

    dbutils.fs.mount( source = "wasbs://******@blobstoreaccount.blob.core.windows.net/", mount_point = "/mnt/nyctrip", extra_configs = {"fs.azure.account.key.blobstorageaccount.blob.core.windows.net":"key"})

    -- Define the path to your Parquet file

    parquet_file_path = "/mnt/nyctrip/NYCTripSmall.parquet"

    --Read the Parquet file into a DataFrame

    df = spark.read.parquet(parquet_file_path)

    -- Show the DataFrame

    df.show()

    enter image description here

    I hope this answers your question.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions


  2. Keerthana J 71 Reputation points
    2024-02-09T08:28:12.8666667+00:00

    Hi , Thanks for the reply. I am facing issue while reading, I completed the mounting process. And I can see the list of files under the container as well. I tried for parquet,csv, excel. attaching excel code storage_account_name = "abcd" storage_account_key = "xxxx" container = "lmnop"     spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)     dbutils.fs.mount(  source = "wasbs://{0}@{1}.blob.core.windows.net".format(container, storage_account_name),  mount_point = "/mnt/lmnop",  extra_configs = {"fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name): storage_account_key} )     CAL = pd.read_excel('/mnt/lmnop/demo.xlsx') CAL.head()   error: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lmnop/demo.xlsx'

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.