Use data set from one data flow as source in another data flow

David Lang 20 Reputation points
2024-05-13T20:16:10.09+00:00

In Azure Data Factory, I'd like to be able to use a data set that's been output from one data flow activity as the source in another data flow activity, but I don't want to have to write to an external database or file like SQL Server or Blob Storage. I see there's an option to write data flow output to cache. Is there some way to use this, or should I just continue the first data flow and have one long data flow activity? I've seen examples using Lookup but these examples all reference one specific data value and I want to easily access the whole data set.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,691 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 16,071 Reputation points
    2024-05-13T22:23:16.03+00:00

    From what I understood you need to do the following :

    1. Create the First Data Flow (Data Flow 1):
      • Design your data transformation logic as required
      • Add a Sink transformation to write the output
      • In the Sink settings, instead of writing to an external storage, configure it to use the Cache option
        • Go to the Settings tab of the Sink transformation
        • Enable Staging and select Cache as the staging type
        • Assign a unique name to the cache (MyDataFlowCache)
    2. Create the Second Data Flow (Data Flow 2):
      • Add a Source transformation
      • Configure the source to read from the cache created in the first data flow
        • Select Cache as the source type
        • Provide the cache name used in the first data flow (MyDataFlowCache)

    More links :

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-cached-lookup-functions

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-activity