Filter Copied data from Copy Activity in Azure Data Factory

Abinash Tumulu 56 Reputation points Microsoft Employee
2022-06-01T08:02:28.52+00:00

We are using Copy activity in Data Factory to copy data from Source to Destination.
Once data is copied to Destination -> Filtering the destination to get MAX date (DataFlow)

Is there a way we can get direct MAX value without a data flow (any additional activity)?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,561 Reputation points Microsoft Employee Moderator
    2022-06-02T12:19:07.667+00:00

    Hi @Anonymous ,

    Thankyou for sharing the required details about source and sink dataset.

    I understand your sink is ADLS file , and not SQL DB which is why it's not possible to directly write a query and retrieve the maxDate.

    The workaround is to iterate through each of the records and compare with the previous row and get the maxDate. This approach is feasible only if the record count is less than 5000 or 4GB. Click here for more details on Lookup Limitation.

    Here is what you need to do:
    1. Create a variable named LatestDate and provide a default string value 1900-01-01T12:34:50 to it.
    2. After your copy activity, use LookUp activity pointing to the destination .txt file and fetch the records.
    3. Use ForEach activity providing item value as @activity('Lookup1').output.value to iterate through each of the records in the .txt file. Make sure to check the Sequential option as we need to iterate and compare values one by one.
    4. Inside ForEach block, use If condition activity to check if the date coming from .txt file is greater than the date value held by variable using the expression @greater(formatDateTime(item().UpdatedDate,'yyyyMMddHHmmss'),formatDateTime(variables('LatestDate'),'yyyyMMddHHmmss'))
    5. Inside True block , use a Set variable activity to update the value of variable to @item().UpdatedDate .
    6. Come out of ForEach and use a copy activity, where source would be pointing to a dummy .csv/.txt file having one column and add an additional column with name as maxDate and value as @variables('LatestDate'). In the sink provide a new .csv/.txt dataset and in the mapping tab , delete the column coming from Dummy file and execute the pipeline

    207837-image.png

    Please refer to the below video for implementation:

    207923-copymaxdatefinal.gif

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,561 Reputation points Microsoft Employee Moderator
    2022-06-01T17:21:13.02+00:00

    Hi @Anonymous ,

    Thankyou for posting your query on Microsoft Q&A platform .

    As I understand your ask here, you are trying to fetch max(date) after copying the data from source datastore to destination datastore . I am considering there is some column holding date value . Please let me know if my understanding is incorrect.

    In order to provide the best possible help, could you please share what is the source and sink dataset? If the sink is SQL server, you can use a lookup activity pointing to destination dataset and use Query option to retrieve the required data: Select max(date) from SchemaName.TableName

    Please check the below image for reference:

    207653-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.