Reproducing SSIS packages in Azure Data Factory

khouloud Belhaj 91 Reputation points
2021-07-19T09:13:21.397+00:00

Hello,

I'm trying to reproduce SSIS packages in azure data factory. The process consists of searching for rows already inserted into a sql database, deleting them, loading the new rows into the database, renaming the files once inserted into the table, and moving them from the folder to another folder.

Knowing that I did the processing which consists in copying the data from my source which is a blob storage to the destination the sql database. My blob files are large files over 2MB per file. I would like to find a way to loop these blob files , to see if I have duplicates inserted into the sql database, so delete them, if there are no duplicates the rows have not been inserted, I will load the data into my database. After that , I have to move the blob file to another container and rename it.

Any suggestions please ?
115872-image.png

![115820-image.png]3

![![115845-image.png]5]4

115846-image.png

Azure SQL Database
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,220 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,832 questions
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,590 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,456 Reputation points Microsoft Employee
    2021-07-22T10:07:29.807+00:00

    Hi @khouloud Belhaj ,

    Thank you for posting your query in Microsoft Q&A Platform.

    Below are the steps which you should follow to implement your scenario.

    Step 1: Use GetMetaData activity and get child items(file names) from your source folder.
    Step 2: Use ForEach Activity. Pass your childItems array to your ForEach activity, to loop each file name
    Step 3: Inside ForEach activity use data flow activity. Your data flow activity should have a parameter to take file name dynamically from forecah.
    Note, see data flow implementation details below>>
    Step4: Inside foreach activity, after data flow activity, use copy activity to copy that files to different folder with different name
    Step5: Inside foreach activity, after data flow activity & Copy activity, Use delete activity to delete your file from source folder

    Data flow Implementation:
    Step1: Create a file name parameter inside data flow to accept file name from foreach.
    Step2: Add Source Transformation which points to files in your source folder dynamically based on parameter value
    Step3: Add another source transformation for your target table
    Step4: Use lookup transformation to find non matching rows between your file and table
    Step5: Use filter transformation to take only non matching rows
    Step6: Use Sink Transformation to load that non matching rows in to your table

    Hope this will help. Please let us know if any further queries. Thank you.

    -----------------------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.