question

AlexanderWhitmore-5513 avatar image
0 Votes"
AlexanderWhitmore-5513 asked bharathn-msft answered

Creating an alert for long running pipelines

I currently have an alert setup for Data Factory that sends an email alert if the pipeline runs longer than 120 minutes, following this tutorial: https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/. So when a pipeline does in fact run longer than the expected time, I do receive an alert however, I am also getting additional & unexpected alerts.

My query looks like:

 ADFPipelineRun
 | where Status =="InProgress" // Pipeline is in progress
 | where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
 | where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes

I received an alert email on September 28th of course saying a pipeline was running longer than the 120 minutes but when trying to find the pipeline in the Azure Data Factory pipeline runs nothing shows up. In the alert email there is a button that says, "View the alert in Azure monitor" and when I go to that I can then press "View Query Results" above the shown query. Here I can re-enter the query above and filter the date to show all pipelines running longer than 120 minutes since September 27th and it returns 3 pipelines.

Something I noticed about these pipelines is the end time column:
138635-alert-query-result.png



I'm thinking that at some point the UTC time is not properly configured and for that reason, maybe the alert is triggered? Is there something I am doing wrong, or a better way to do this to avoid a bunch of false alarms?

azure-data-factoryazure-monitor
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@AlexanderWhitmore-5513 Welcome to Microsoft Q&A and thanks for your query.

Interesting that you are seeing this , will further explore and share any learnings we might have. Also, will check with our internal team and understand why you are experiencing this behavior. Thank you

0 Votes 0 ·

That would be so helpful, thanks!

0 Votes 0 ·
bharathn-msft avatar image bharathn-msft AlexanderWhitmore-5513 ·

Also if you want only pipeline runs which are still in progress and taking more than 12o minutes , you might have to tweak the query little bit more to get all the run's with in progress status and exclude the runs which have success status.

I will try to explore the above kusto query, will share as i am able to test it out.

Please let us know if you have any additional queries. Thank you

0 Votes 0 ·
Show more comments

1 Answer

bharathn-msft avatar image
0 Votes"
bharathn-msft answered

@AlexanderWhitmore-5513 After further review on one of our test pipeline, I could see the same End(UTC) 1/1/1601, 12:00:00.000 AM. Seems that its a generic end time for any pipeline run entries for other than Succeeded or Failed status.

Please be aware that for any run it would have at least 3 entries in the ADFPipeline table for Queued, In Progress and Succeeded status (for a successful run) . The alert should trigger correctly based on the interval you set, however when you try to run the query from the alert email (that too after the fact probably after a day or so) it would run based on current time - start time of pipeline which would be greater than 120 mins at that time and show all pipelines irrespective .

  ADFPipelineRun
  | where Status =="InProgress" // Pipeline is in progress
  | where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
  | where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes

You might have to tweak the query to understand the run's which are successfull

      ADFPipelineRun
      | where Status =="Succeeded" // Pipeline is succeeded
      | where datetime_diff('minute', End, Start) > 120 // It took more than 120 minutes to run

Hope this information helps.





5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.