Thank you for reaching out to the community forum with your query.
Regarding your questions:
Is this behavior normal?
Yes, it's normal behavior in Azure Databricks.
When you execute a Databricks notebook using Azure Data Factory, the Databricks cluster is started automatically if it is not already running. This is because Azure Data Factory uses the Databricks REST API to start the cluster before executing the notebook.
However, if the cluster is already running, Azure Data Factory will not stop the cluster after the notebook execution is complete. This is because the cluster may be used by other notebooks or jobs, and stopping the cluster may cause issues for those jobs.
So, in your case, if the Databricks cluster was already running when the Azure Data Factory pipeline executed the notebook, the cluster would not have been stopped after the notebook execution was complete.
Shouldn’t the cluster need to be running for the notebook to execute successfully?
Ideally, the Databricks cluster should be running for the notebook to execute successfully. However, ADF does not check the status of the cluster before initiating the notebook run, which is why the pipeline completes successfully even when the cluster is stopped.
Does ADF automatically start the cluster when initiating a notebook run?
No, ADF does not automatically start the Databricks cluster when initiating a notebook run. However, you can configure ADF to start the cluster before the notebook run and stop it after the run is complete.
Could there be a configuration in ADF or Databricks that is allowing this to happen?
No, there is no specific configuration in ADF or Databricks that allows this behavior. It is simply the default behavior of ADF when initiating a notebook run.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.