Query Large CSV from Gen2 join with a table Azure SQL DB

Giri, Seshu 26 Reputation points
2023-03-07T17:20:23.79+00:00

Currently we use Azure SQL DB for a small data warehouse and recently we have a requirement to pull data from CSV files. Those CSV's are huge, around 250 gb (a single file) and another one 750 gb (single file).

Instead of storing huge data from CSV to Azure SQL DB, can I store those in Gen2 and try to join with a table in Azure SQL db for reporting?

We have a way to split those those files into "Year", that I can do.

Can someone suggest me best way to query this huge csv files? We don't have budget for "dedicated pool", so trying to look for other options.

Appreciare your help on this.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,333 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,335 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. BhargavaGunnam-MSFT 25,876 Reputation points Microsoft Employee
    2023-03-08T20:49:04.46+00:00

    Hello @Giri, Seshu,

    Welcome to the MS Q&A platform.

    You can store those CSV files in Azure Data Lake Storage Gen2 and then join them with tables in Azure SQL Database. This approach can help you avoid storing huge CSV data in Azure SQL Database.

    You can use serverless Synapse SQL pools to enable your Azure SQL to read the files from the Azure Data Lake storage.

    Azure SQL supports the OPENROWSET function that can read CSV files directly from Azure Blob storage.

    On the Azure SQL DB please follow the below steps.

    1. set up an external data source in Azure SQL
    2. Create a proxy external table in Azure SQL
    3. Query Azure Storage files by joining your SQL tables.

    The process is explained in this Microsoft devblogs

    In your case, once the proxy table is ready, you can join your table with your remote external table and the underlying Azure storage files from any tool connected to your Azure SQL database.

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.