Query Data From Data Files in Azure Storage

Muneeb Mirza 41 Reputation points
2021-07-05T15:22:16.123+00:00

In AWS cloud we have S3 bucket which contains CSV or Parquet files. Our customers require the catalog information of data files there and query data inside those files. Since downloading and querying is not an option as CSV file can be large. We used AWS Glue and Athena for this purpose, now we can catalog data inside the CSV files and query them as well.

Is it possible to achieve this in Azure Cloud with Azure Storage?

Storage can be Azure Blob Storage or Data Lake Storage, whichever gets the job done.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,192 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sumarigo-MSFT 47,466 Reputation points Microsoft Employee Moderator
    2021-07-07T05:41:45.187+00:00

    @Muneeb Mirza Firstly, apologies for the delay in responding here and any inconvenience this issue may have caused.

    I see you have posted the similar query in SO forum, Kindly try the follow the steps mentioned over-there, kindly let us know, if you still have any questions on this.

    For example, the CSV might look like…

    Server,schema,database,table,tableDescription
    Server1,dbo,mydb,table0001,”This is a really great table!”

    If so, you’ll have to write some code on your own to parse the data and conform it to the Atlas Entity REST API calls.
    You might take inspiration from the work I’ve done on PyApacheAtlas: pyapacheatlas/reader.py at master · wjohnson/pyapacheatlas (github.com)

    Alternatively, you might work with them to restructure the data (in a Python program) to conform to the templates that I provided. The required headers are:
    • "typeName"
    • "name"
    • "qualifiedName"
    • "classifications"

    If you have that information in a list of dictionaries in Python, you could use Reader.parse_bulk_entities(list_of_dicts) and then take the results and call client.upload_entities(results) and bulk upload the entities inside that csv.

    Hope this helps!
    Kindly let us know if the above helps or you need further assistance on this issue.

    ---------------------------------------------------------------------------------------------------------------------------------------------------

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.