Extract files in batches from data lake in Azure Data factory

pankaj chaturvedi 91 Reputation points
2021-05-01T12:18:51.053+00:00

Hi Team,

In my data lake storage has multiple files almost 100 which contains around 27M records and I am loading on the daily basis(Delta Load) in azure SQL DB Stage table (Using copy activity) which is causing performance issue. After stage table I am using stored procedure activity to load data in target table from stage table.

Currently pipeline is taking almost between 12-15 hours to get complete.
files type is CSV, but let me know if it is possible to implement the logic for Snappy.parquet files

COPY ACTVITY(Data lake to STG table)--->STORED PROCEDURE ACTIVITY(STG To Target)

Is there any way to load the files in batches ?

STEP1:-means first time copy only 10---Load in stg table---stage table to target table then truncate stg table.

STEP2:-means first time copy only 10-20---Load in stg table---stage table to target table then truncate stg table

Could you please help me here.?

I am looking forward your response.

Thanks,
Pankaj

Azure SQL Database
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,640 questions
{count} votes

2 answers

Sort by: Most helpful
  1. pankaj chaturvedi 91 Reputation points
    2021-05-04T18:25:12.253+00:00

    Hi Team, Could you please help me here.? Thanks!

    0 comments No comments

  2. MartinJaffer-MSFT 26,236 Reputation points
    2021-05-06T15:15:55.48+00:00

    @pankaj chaturvedi

    Three step process:

    1. Get list of files in the folder
    2. Exclude any not-files (subfolders)
    3. ForEach Loop over the list of files. Pass the file name to Copy activity.

    Set up dataset and Get Metadata activity to fetch files in folder
    94327-image.png
    94439-image.png

    Filter activity:
    Items: @activity('Get List of Files').output.childItems
    Condition: @equals(item().type,'File')

    Set up ForEach loop
    94470-image.png
    Batch count: 10, or however many you want to be processed at a time
    Items: @activity('Exclude Folders').output.Value

    Parameterize the dataset for copy activity.
    94409-image.png
    94515-image.png

    Pass file name to copy activity
    94410-image.png


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.