How can I filter data in Copy Data Activity using a query? The filter column inputs are coming from different Copy Data Activity.

Sikha Kaushik 0 Reputation points
2024-04-10T09:37:41.64+00:00

I'm creating a pipeline that has two Copy Activities. The first Copy Activity, "consumer_pred," contains consumers for which predictions are to be made. The second Copy Activity, "consumer_master_bills," contains all historical invoices of all consumers ever existed but takes a lot of time to execute. I want to filter "consumer_master_bills" data only for consumers in "consumer_pred" data. How can I filter the data at the source of "consumer_master_bills" itself to reduce runtime?

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,371 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,736 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 16,146 Reputation points
    2024-04-10T10:03:24.74+00:00

    You can implement an Azure Function that accepts a list of consumer IDs and returns a query or a list that can be used to filter the"consumer_master_bills data.

    You can write it in C#, JavaScript, or Python (the language yourself comfortble) and it would generate a SQL query (or a filter condition) based on your input.

    Then make sure your Azure Function exposes an HTTP endpoint that accepts a payload with consumer IDs and returns a suitable query string or filter criteria.

    How to integrate it with Azure Data Factory ?

    Use a Web Activity in Azure Data Factory to call your Azure Function. Pass the list of consumer IDs from consumer_pred as the input to this Azure Function.

    You can store the output of the first Copy Data Activity in a variable or a database/table, then pass this data to the Web Activity.

    The output of the Azure Function should be captured in a variable within your Data Factory pipeline.

    How to use the Dynamic Content in Copy Data Activity ?

    In the Source settings of your consumer_master_bills Copy Data Activity, use dynamic content to incorporate the filter query or condition.

    This would typically be done in the query option of the source dataset if you're dealing with SQL sources, or in the dataset expression if it's a non-SQL source.

    Then you use the variable that holds the Azure Function output as the filter condition in your query. For example, if you're querying a SQL database, your source dataset query might look something like SELECT * FROM consumer_master_bills WHERE ConsumerID IN (@filterCriteria), where @filterCriteria is dynamically populated from the Azure Function's output.

    0 comments No comments

  2. Pinaki Ghatak 2,400 Reputation points Microsoft Employee
    2024-05-13T08:56:25.7866667+00:00

    Hello @Sikha Kaushik

    You can use the Filter activity in your pipeline to filter the data at the source of consumer_master_bills itself.

    The Filter activity filters the inputs to Azure Data Factory and Synapse Analytics pipelines. You can use a Filter activity in a pipeline to apply a filter expression to an input array.

    Here's an example of how you can use the Filter activity to filter the data in consumer_master_bills for only the consumers in consumer_pred

    1. Create a pipeline with two Copy Activities, consumer_pred and consumer_master_bills.
    2. Add a Filter activity to the pipeline between the two Copy Activities.
    3. In the Filter activity, set the items property to the input array, which is the output of the consumer_master_bills Copy Activity.
    4. Set the condition property to the filter expression that filters the data for only the consumers in consumer_pred. For example, if the consumer_pred data contains a column named consumer_id, you can set the condition property to @contains(activity('consumer_pred').output.firstRow.consumer_id, consumer_id). This will filter the data in consumer_master_bills for only the consumers in consumer_pred, reducing the runtime of the pipeline.

    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.

    0 comments No comments