Integration - Azure Data Factory - How to copy all records from CSV kept in data lake to on-premises SQL database

Jainav Surana 1 Reputation point
2021-09-08T04:46:40.047+00:00

I am trying to pass all records of csv from data lake to on-premises database of SQL using Azure data factory. I know how to pass data one by one using LookUp and Copy activity but I don't know how to pass all records of csv.
Before insertion into database I want to validate all records are correct, not corrupted basically as per our logic so I was thinking to pass all records and validate. If all records are correct then dump into multiple tables of a database otherwise reject the CSV.
Can you please share your inputs if anyone have any idea

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,397 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,914 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Nandan Hegde 30,081 Reputation points MVP
    2021-09-08T04:51:33.41+00:00

    Hey,
    You can use copy activity with your csv file as source and Azure SQL db as the table.
    The copy activity would copy all records from file into table.

    There is no need for a lookup activity.

    There is an open source tool Great_Expectations:
    https://greatexpectations.io/
    that is used to scan the files and have proper data validations before that is ingested into system.

    You can leverage that before the copy activity wherein you can run all your expectation cases and proceed to copy activity only if all are success.

    Else, you can load the data into a staging table in Azure database and via Stored procedure activity validate all your data in staging table and then move data into other tables within database if all cases are verified


  2. HimanshuSinha-msft 19,386 Reputation points Microsoft Employee
    2021-09-10T01:14:09.357+00:00

    Helo @Jainav Surana ,

    Just to add to what @Nandan Hegde said above , you can use the below logic also

    1. Copy all the data into an staging table using copy activiy .
    2. You can use Lookup activity to call a proc on the SQL to check all the validation . The proc must return a datatset
    3. Using the lookup data to loop in to copy activity and copy the data to the right sql table .
    4. delete the staging tables .

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    0 comments No comments