Azure Data Factory pipeline with one source and three destinations based on provider id

Peter M. Florenzano 1 Reputation point
2021-04-01T14:40:22.913+00:00

Hello everyone,

I'm in process of planning out a demo for one of our clients. They are a health care provider with many providers, but for the demo sake, this will be 3 - 4 records per store type, which is BLOB storage, Azure Data Lake and SFTP.

Based on the source data, which resides in Azure SQL Database, provider id and store type, the data will land in one out of three destinations (ADL, BLOB Storage & SFTP Gateway).

I'm not sure how I would go about developing this without separate pipelines.

Any information would be greatly appreciated.

Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

7 answers

Sort by: Most helpful
  1. Anonymous
    2021-04-05T16:30:50.103+00:00

    Being a data processing problem this is best handled by dataflow. Source and use the split transform to place data into 3 different folders/files in a temporary destination on lake(SFTP is not yet available as a sink in dataflow). Then use a copy activity to write the data to SFTP location.

    I also assume you want this solution to scale. Iterating rows in foreach is not a scalable solution and will work only for small data loads.

    1 person found this answer helpful.

  2. Nandan Hegde 36,716 Reputation points MVP Volunteer Moderator
    2021-04-01T16:21:30.62+00:00

    Hey,
    Can you please share some sample data and some example of your expectation which might help us to provide a better approach.

    Based on my understanding that there is a config table which indicates what data needs to be copied where:
    You can leverage a combination of below activities in a single pipeline:

    Lookup activity to get the provider id and destination
    The output of lookup activity can be added in if activity or switch activity with each scenario having a copy activity with SQL db as source and either of the destination

    0 comments No comments

  3. Peter M. Florenzano 1 Reputation point
    2021-04-01T16:29:58.917+00:00

    Hi Nandan,

    Yes of course, below is my sample data:

    83744-sampledataset.png

    The records that have a StoreType = BLOB need to be placed in Azure BLOB Storage. The records with a StoreType = 'ADF' need to be placed in Azure Data Lake and so on.


  4. Peter M. Florenzano 1 Reputation point
    2021-04-01T18:51:15.877+00:00

    Thank you,

    I have it setup like that already, the only issue I'm running across is the ProviderID folders need to be dynamically created during runtime.

    Should I use a lookup activity and a for-each loop or is there another way I can pass that ProviderID as an array?


  5. Peter M. Florenzano 1 Reputation point
    2021-04-02T11:14:52.49+00:00

    Thank you for your response Nandan,

    What I'm struggling with is how do I pass the list of distinct of ProviderID's from my lookup to the ForEach loop? Within the Lookup, I'm using the following query:

    SELECT DISTINCT ProviderID
    FROM [dbo].[PatientInformation]
    WHERE StoreType = 'BLOB'

    How do I pass the above values as an array to the ForEach loop? I did create a parameter as ProviderID with a parameter type as an Array for the pipeline which is as follows:

    @pipeline().parameters.ProviderID

    When I attempt to run the pipeline, it's asking me for a value. Shouldn't the value be passed automatically?

    Thanks again


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.