How to read ADLS csv files to get the row count value using ADF activity and should not start successor pipeline when there is no record as we have another pipeline to process data thorugh databricks activity

Abdulla Mahammad Khan Dawood 186 Reputation points
2021-02-03T07:45:49.957+00:00

Hi All,

We have a requirement where we are performing data ingestion through pipeline with incremental copy activity from SQL Server source to Data Lake using Change Tracking feature for each tables. Now after successful data ingestion pipeline execution, How we can read and identify recent Raw csv files in Data lake having row count zero and stop executing subsequent pipeline for data processing which happens through databricks notebook.

Note: All the copied csv files into ADLS are named with TableName_yyyyMMddHHmmss.csv in ADLS Raw folder under TableName/YYYY/MM/DD.

Basically we have to avoid running data processing pipeline when there is changeset records after data ingestion pipeline run.

Thank you in anticipation!!

Appreciate if some can help and suggest a solution.

Best Regards,
Mahammad Khan

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,601 questions
0 comments No comments
{count} votes

Accepted answer
  1. Nasreen Akter 10,751 Reputation points
    2021-02-05T20:38:27.273+00:00

    Hi @Abdulla Mahammad Khan Dawood ,

    In the ADF, before executing main-activities, you can do the following:

    1. do a lookup to the logtxt file
    2. foreach row, add rowsCopied to a temp+main variables
    3. add an IfElse activity, in the If-condition, check if the value of the mainVariable > 0
    4. now, add the main activities under the If true activities

    Please see the screenshots for details. Hope this helps. Thanks!

    @activity('Lookup1').output.value  
    @equals(item().Pipeline_Name, 'PL_SQL_CT_SRC_TO_RAW_LOGGING')  
    @string(add(int(item().rowsCopied), int(variables('numberOfRowChanged'))))  
    @variables('tempVar')  
    @greater(int(variables('numberOfRowChanged')), 0)  
    

    64741-dataset-20210205.jpg
    64727-variables-20210205.jpg
    64724-lookup-20210205.jpg
    64725-foreach-20210205.jpg
    64726-foreach-if-20210205.jpg
    64742-foreach-ifcondition-20210205.jpg64704-final-if-condition-20210205.jpg
    64743-foreach-ifcondition-setvaiables-20210205.jpg
    64658-foreach-ifcondition-setvaiablescondition-20210205.jpg
    64660-foreach-ifcondition-setvaiablescondition2-20210205.jpg
    64705-txtfile-20210205.jpg
    64640-execution-20210205.jpg

    2 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,131 Reputation points
    2021-02-03T16:19:17.017+00:00

    Hi @Abdulla Mahammad Khan Dawood ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    Count of records in files can be found using lookup activity and IF Condition activity can be used to check if count>0 to identify non-empty files. In the below example, for individual files, I have used conditional check to perform an empty activity only if file is empty and non-empty activity accordingly. We can use variable if we really want to avoid an activity if at least one file is empty in the set of files. Hence, in both cases, this approach works.

    63538-skippingemptyfiles.gif

    IF Condition Activity Expression:

    @greater(activity('Lookup1').output.count,0)  
    

    Hope this helps! Please let us know for further queries and we will be glad to assist.

    --

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    1 person found this answer helpful.