Implementing Dynamic Folder Structuring for Historical Data Load from Multiple SQL Server On-Premises Tables to Azure Blob Storage

Lucas Medina 20 Reputation points
2024-01-07T22:50:51.4266667+00:00

I need to extract data from multiple SQL Server On-premises tables to Azure Blob Storage with a historical load dated before 2023-12-17. However, for each table, I need to store them in dynamically created folders following the structure year/month/day/tablename.parquet.

I have a similar process in which I perform a full load with multiple tables, using a control table that lists all the tables I wish to load. This process involves using lookup and foreach activities, as shown in the print below:

User's image

User's image

Has anyone implemented a similar solution, or can you suggest an approach to achieve this?

Cheers,

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,609 questions
{count} votes

Accepted answer
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2024-01-08T10:15:07.6266667+00:00

    @Lucas Medina

    Thanks for reaching MS Q&A

    Extract Data from SQL Server On-Premises: You can use Azure Data Factory to create a pipeline that copies data from your SQL Server database to Azure Blob storage. You would need to create a self-hosted integration runtime, which moves data between on-premises and cloud data stores.

    Historical Load: To load data before a specific date, you can add a filter in your query to only select records before the date 2023-12-17.

    Dynamic Folder Creation: Azure Blob Storage doesn’t natively support folders. However, you can create a virtual directory structure by prefixing the blob name with the desired folder path. For example, you can name your blobs like year/month/day/tablename.parquet. This will give the illusion of them being stored in folders.

    Full Load with Multiple Tables: You can use the Lookup activity to retrieve a list of table names from your control table. Then, use a ForEach activity to iterate over this list. Inside the ForEach loop, you can place a Copy activity that copies data from the current table in SQL Server to Azure Blob Storage.

    Lookup and ForEach Activities: The Lookup activity can retrieve a dataset from any data source supported by Azure Data Factory and return the content of a configuration file or table. The ForEach activity defines a repeating control flow in your pipeline. It iterates over a collection and executes specified activities in a loop.

    Hope this helps. Do let us know if you any further queries


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.