question

PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 asked SaurabhSharma-msft commented

Azure Data Factory pipeline with one source and three destinations based on provider id

Hello everyone,

I'm in process of planning out a demo for one of our clients. They are a health care provider with many providers, but for the demo sake, this will be 3 - 4 records per store type, which is BLOB storage, Azure Data Lake and SFTP.

Based on the source data, which resides in Azure SQL Database, provider id and store type, the data will land in one out of three destinations (ADL, BLOB Storage & SFTP Gateway).

I'm not sure how I would go about developing this without separate pipelines.

Any information would be greatly appreciated.

Thank you

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 answered

Hey,
Can you please share some sample data and some example of your expectation which might help us to provide a better approach.

Based on my understanding that there is a config table which indicates what data needs to be copied where:
You can leverage a combination of below activities in a single pipeline:

Lookup activity to get the provider id and destination
The output of lookup activity can be added in if activity or switch activity with each scenario having a copy activity with SQL db as source and either of the destination

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 answered NandanHegde-7720 commented

Hi Nandan,

Yes of course, below is my sample data:

83744-sampledataset.png



The records that have a StoreType = BLOB need to be placed in Azure BLOB Storage. The records with a StoreType = 'ADF' need to be placed in Azure Data Lake and so on.


sampledataset.png (17.4 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey,
You can create 3 copy activities within a pipeline with each copy activity representing SFTP or blob or adl.
The source would be azure SQL db with query being filtered as per need

Sample:
Select * from table where storetype='blob' would be the source query for the copy activity with blob as destination

1 Vote 1 ·
PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 answered NandanHegde-7720 commented

Thank you,

I have it setup like that already, the only issue I'm running across is the ProviderID folders need to be dynamically created during runtime.

Should I use a lookup activity and a for-each loop or is there another way I can pass that ProviderID as an array?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey,
You are right .
It is best to use a lookup activity to get distinct providerID and use a foreach loop activity to pass the array (and you can use parallel load for foreach activity) and parameterise the output dataset path with the ProviderId

0 Votes 0 ·
PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 answered NandanHegde-7720 edited

Thank you for your response Nandan,

What I'm struggling with is how do I pass the list of distinct of ProviderID's from my lookup to the ForEach loop? Within the Lookup, I'm using the following query:

SELECT DISTINCT ProviderID
FROM [dbo].[PatientInformation]
WHERE StoreType = 'BLOB'

How do I pass the above values as an array to the ForEach loop? I did create a parameter as ProviderID with a parameter type as an Array for the pipeline which is as follows:

@pipeline().parameters.ProviderID

When I attempt to run the pipeline, it's asking me for a value. Shouldn't the value be passed automatically?

Thanks again

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey,

You can use the :
@activity('Test').output.value

83990-adf.png

Note: You can uncheck sequential to allow parallel loops

No need to create a parameter. you can directly pass the array to for activity

0 Votes 0 ·
adf.png (23.6 KiB)
PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 answered NandanHegde-7720 commented

@NandanHegde-7720

Hi,

This message comes up when I enter that string within the Items section of the ForEach section:

"The output of activity 'Test' can't be referenced since it is either not an ancestor to the current activity or does not exist"

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Ohhh!
Sorry for the confusion.
It should be the Lookup activity name and not hardcoded as Test.
If you see my lookup activityname is Test.
Replace Test with your lookup activityname

0 Votes 0 ·
PeterFlorenzano-9808 avatar image
0 Votes"
PeterFlorenzano-9808 answered SaurabhSharma-msft commented

@NandanHegde-7720 That works, thank you very much! The only thing this process needs to do now is create new containers in both the ADL and BLOB destinations for each ProviderID listed within the table for BLOB, the same with ADL and SFTP.

What it's currently doing is creating the first ProviderID and copying all the records that equal to BLOB instead of creating a new container for each Provider ID

84083-adloutput-04022021.png

I'm sure an adjustment needs to be made on the dataset side.

Here is a screenshot of how the containers should look with each unique Provider ID

84111-sampledataset-2.png



Any help would be greatly appreciated.

Thanks again


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@PeterFlorenzano-9808 Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.

0 Votes 0 ·
Kiran-MSFT avatar image
0 Votes"
Kiran-MSFT answered SaurabhSharma-msft commented

Being a data processing problem this is best handled by dataflow. Source and use the split transform to place data into 3 different folders/files in a temporary destination on lake(SFTP is not yet available as a sink in dataflow). Then use a copy activity to write the data to SFTP location.

I also assume you want this solution to scale. Iterating rows in foreach is not a scalable solution and will work only for small data loads.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@PeterFlorenzano-9808 Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.

0 Votes 0 ·