Querying Parquet file from ADL 2.0

Question

HI - I need some design recommendations for my requirement. I have list of parquet files stored in Data lake. I need to execute some SQL statements against them , ex. select max(amount), count() and return the results so that i can write back them to my On premise SQL server. Please note that I am looking for a solution without storing the data in intermittently in Dedicated SQL Pool using Copy or external tables.

Using OpenRowset() function in SQL Serverless pool only help to analyse the data but I can't return the resultset and store in my On premise server using data pipelines. Please correct me if my understanding is not right here.
Data pipelines allows to transfer the data from ADL to other stores but it doesn't have capability to query and return results so that i can sink it to On Premise SQL server using copy activity. However, it looks like without storing to SQL Pool we can't achieve it either .

One more option i am considering is, Can I pass the parquet file name to spark note book and does it have capability to run the queries and return the results so that I will store the results to On-premise server. I will use Data pipeline copy activity to automate the workflow. Is it possible to implement the way I am thinking, please confirm.

Please let me know other possible options to achieve my objective.

thanks

Answer

It doesn't have to be a SQL Pool, but you will need to write staging data somewhere. This includes using a Notebook/Spark Pool to do the transformations. The Data Flow activity in Pipelines is a codeless method of doing the transformations you need that runs on top of a Spark cluster, but it also requires outputting the data into a file or database.

Writing back to the Data Lake is going to be your best option, especially if you don't want to create a SQL Pool. It will also give you a way to troubleshoot bugs and if the sink back to on-prem fails due to network issues you won't have to completely start over, so there are benefits to doing it that way..

Share via

Querying Parquet file from ADL 2.0

1 answer