How to get parquet file data from data lake gen2 in tabular form using postman or web api.

randeep gurjar 0 Reputation points
2024-08-15T17:34:42.1366667+00:00

i have a file in my blob storage gen2 container i want to fetch this file data by using api or postman in the tabular form by using query without azure synapse, azure sql and databricks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,480 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,220 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-08-19T06:28:30.6966667+00:00

    Hello randeep gurjar,

    Greetings! Welcome to Microsoft Q&A Platform.

    To read the data in Blob storage, you can try using Azure Blob Storage REST API. Reading a Parquet File from Azure Blob storage of the document Reading and Writing the Apache Parquet Format of pyarrow, manually to list the blob names with the prefix like dataset_name using the API list_blob_names(container_name, prefix=None, num_results=None, include=None, delimiter=None, marker=None, timeout=None) of Azure Storage SDK for Python as the figure below, then to read these blobs one by one.

    There are a few things you can try to consider for better performance:

    1.If your workloads require a low consistent latency and/or require a high number of input output operations per second (IOP), consider using a premium block blob storage account. This type of account makes data available via high-performance hardware. Data is stored on solid-state drives (SSDs) which are optimized for low latency. SSDs provide higher throughput compared to traditional hard drives. The storage costs of premium performance are higher, but transaction costs are lower. Therefore, if your workloads execute a large number of transactions, a premium performance block blob account can be economical. https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#consider-premium

    2.To achieve the best performance, use all available throughput by performing as many reads and writes in parallel as possible.

    https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#configure-data-ingestion-tools-for-maximum-parallelization

    3.Larger files lead to better performance and reduced costs.

    4.File format, file size, and directory structure can all impact performance and cost.

    Refer to detailed documentation here: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices

    Hope this answer helps! please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.