Azure Data Factory pipeline with one source and three destinations based on provider id

Peter M. Florenzano 1 Reputation point
2021-04-01T14:40:22.913+00:00

Hello everyone,

I'm in process of planning out a demo for one of our clients. They are a health care provider with many providers, but for the demo sake, this will be 3 - 4 records per store type, which is BLOB storage, Azure Data Lake and SFTP.

Based on the source data, which resides in Azure SQL Database, provider id and store type, the data will land in one out of three destinations (ADL, BLOB Storage & SFTP Gateway).

I'm not sure how I would go about developing this without separate pipelines.

Any information would be greatly appreciated.

Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,599 questions
0 comments No comments
{count} votes

7 answers

Sort by: Newest
  1. Kiran-MSFT 691 Reputation points Microsoft Employee
    2021-04-05T16:30:50.103+00:00

    Being a data processing problem this is best handled by dataflow. Source and use the split transform to place data into 3 different folders/files in a temporary destination on lake(SFTP is not yet available as a sink in dataflow). Then use a copy activity to write the data to SFTP location.

    I also assume you want this solution to scale. Iterating rows in foreach is not a scalable solution and will work only for small data loads.


  2. Peter M. Florenzano 1 Reputation point
    2021-04-02T13:52:54.147+00:00

    @Nandan Hegde That works, thank you very much! The only thing this process needs to do now is create new containers in both the ADL and BLOB destinations for each ProviderID listed within the table for BLOB, the same with ADL and SFTP.

    What it's currently doing is creating the first ProviderID and copying all the records that equal to BLOB instead of creating a new container for each Provider ID

    84083-adloutput-04022021.png

    I'm sure an adjustment needs to be made on the dataset side.

    Here is a screenshot of how the containers should look with each unique Provider ID

    84111-sampledataset-2.png

    Any help would be greatly appreciated.

    Thanks again


  3. Peter M. Florenzano 1 Reputation point
    2021-04-02T13:04:22.653+00:00

    @Nandan Hegde

    Hi,

    This message comes up when I enter that string within the Items section of the ForEach section:

    "The output of activity 'Test' can't be referenced since it is either not an ancestor to the current activity or does not exist"


  4. Peter M. Florenzano 1 Reputation point
    2021-04-02T11:14:52.49+00:00

    Thank you for your response Nandan,

    What I'm struggling with is how do I pass the list of distinct of ProviderID's from my lookup to the ForEach loop? Within the Lookup, I'm using the following query:

    SELECT DISTINCT ProviderID
    FROM [dbo].[PatientInformation]
    WHERE StoreType = 'BLOB'

    How do I pass the above values as an array to the ForEach loop? I did create a parameter as ProviderID with a parameter type as an Array for the pipeline which is as follows:

    @pipeline().parameters.ProviderID

    When I attempt to run the pipeline, it's asking me for a value. Shouldn't the value be passed automatically?

    Thanks again


  5. Peter M. Florenzano 1 Reputation point
    2021-04-01T18:51:15.877+00:00

    Thank you,

    I have it setup like that already, the only issue I'm running across is the ProviderID folders need to be dynamically created during runtime.

    Should I use a lookup activity and a for-each loop or is there another way I can pass that ProviderID as an array?