ADF to schedule databricks streaming job

Question

Hello,

We are using Azure Data Factory (ADF) to schedule databricks streaming job.
Requirement: The streaming job may need to run all the time. If it is cancelled or failed, then we need to automatically restart it.
Problem: Currently we use ADF to maintain the pipeline. But in the pipeline setting there is a parameter Timeout and it could be at most 7 days. So for the pipeline it will failed with timeout issue several days and we need to manually restart it. Or we need to manually cancel it and restart it before it is timeout. But this will need manual effort to maintain.

So could we make the process automatically, for example, monitor if the pipeline fail then restart? Or just set the process to automatically cancel and restart?

Accepted Answer

Hi @Tommy Tan ,

Thank you for posting query in Microsoft Q&A Platform.

Couple of ways I can think of in this case.

Option1: Use tumbling window trigger with retry option. Retry option will help you to re-run pipeline if it fails. Click here to know about tumbling window triggers.

Option2: We can create another pipeline which monitors execution of original pipeline and take a decision to re-run original pipeline when it fails.
This monitoring Pipeline, should use Web activity to make API call to get status of original pipeline run. If original pipeline run status is failed then we will re-run original pipeline either by making API call or by using Execute Pipeline activity.
Make sure, you will be scheduling your monitoring pipeline to run daily or hourly based on need to performing monitoring.

Option3: You can also consider creating a logic app which runs with scheduled timings to monitor pipeline run and re-run it if it failed. Here logic app should make API call to get status of pipeline run and make another API call to run pipeline if any failure.

Below are the few documentation links for ADF REST APIs.

Get Pipeline Run API doc link: https://learn.microsoft.com/en-us/rest/api/datafactory/pipeline-runs/get
Create Pipeline Run API doc link: https://learn.microsoft.com/en-us/rest/api/datafactory/pipelines/create-run

Hope this will help. Please let us know if any further queries.

---------------------

Please consider hitting Accept Answer button. Accepted answers helps community as well.

ADF to schedule databricks streaming job

0 additional answers