Use data set from one data flow as source in another data flow

David Lang 20 Reputation points
2024-05-13T20:16:10.09+00:00

In Azure Data Factory, I'd like to be able to use a data set that's been output from one data flow activity as the source in another data flow activity, but I don't want to have to write to an external database or file like SQL Server or Blob Storage. I see there's an option to write data flow output to cache. Is there some way to use this, or should I just continue the first data flow and have one long data flow activity? I've seen examples using Lookup but these examples all reference one specific data value and I want to easily access the whole data set.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,739 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 16,146 Reputation points
    2024-05-13T22:23:16.03+00:00

    From what I understood you need to do the following :

    1. Create the First Data Flow (Data Flow 1):
      • Design your data transformation logic as required
      • Add a Sink transformation to write the output
      • In the Sink settings, instead of writing to an external storage, configure it to use the Cache option
        • Go to the Settings tab of the Sink transformation
        • Enable Staging and select Cache as the staging type
        • Assign a unique name to the cache (MyDataFlowCache)
    2. Create the Second Data Flow (Data Flow 2):
      • Add a Source transformation
      • Configure the source to read from the cache created in the first data flow
        • Select Cache as the source type
        • Provide the cache name used in the first data flow (MyDataFlowCache)

    More links :

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-cached-lookup-functions

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-activity


  2. Smaran Thoomu 10,720 Reputation points Microsoft Vendor
    2024-05-15T11:10:41.0133333+00:00

    Hi @David Lang

    To use a data set that's been output from one data flow activity as the source in another data flow activity in Azure Data Factory, you can use the "Copy Data" activity. The "Copy Data" activity can reference the output of a previous data flow activity and use it as a source in the current data flow activity.

    Here are the steps to use the "Copy Data" activity:

    1. In the current data flow activity, add a new source and select "Azure Blob Storage" as the source type.
    2. In the "Azure Blob Storage" source settings, select the "Copy Data" activity as the reference data flow.
    3. Select the output of the previous data flow activity that you want to use as the source.
    4. Map the columns from the "Azure Blob Storage" source to the current data flow activity.

    Note that if you want to use the "Cache" option to store the output of the first data flow activity, you can use the "Lookup" activity to reference the cached data set. You can find more information about using the "Cache" option in Azure Data Factory.

    I hope this helps. Please let me know if you have any quires.

    0 comments No comments