Query Large CSV from Gen2 join with a table Azure SQL DB

Question

Currently we use Azure SQL DB for a small data warehouse and recently we have a requirement to pull data from CSV files. Those CSV's are huge, around 250 gb (a single file) and another one 750 gb (single file).

Instead of storing huge data from CSV to Azure SQL DB, can I store those in Gen2 and try to join with a table in Azure SQL db for reporting?

We have a way to split those those files into "Year", that I can do.

Can someone suggest me best way to query this huge csv files? We don't have budget for "dedicated pool", so trying to look for other options.

Appreciare your help on this.

Answer

Hello @Giri, Seshu,

Welcome to the MS Q&A platform.

You can store those CSV files in Azure Data Lake Storage Gen2 and then join them with tables in Azure SQL Database. This approach can help you avoid storing huge CSV data in Azure SQL Database.

You can use serverless Synapse SQL pools to enable your Azure SQL to read the files from the Azure Data Lake storage.

Azure SQL supports the OPENROWSET function that can read CSV files directly from Azure Blob storage.

On the Azure SQL DB please follow the below steps.

set up an external data source in Azure SQL
Create a proxy external table in Azure SQL
Query Azure Storage files by joining your SQL tables.

The process is explained in this Microsoft devblogs

In your case, once the proxy table is ready, you can join your table with your remote external table and the underlying Azure storage files from any tool connected to your Azure SQL database.

I hope this helps. Please let me know if you have any further questions.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

Share via

Query Large CSV from Gen2 join with a table Azure SQL DB

1 answer