How to get the databricks worker nodes used for a particular ADF pipeline

swati sharma 20 Reputation points
2023-08-06T18:51:04.2966667+00:00

Hi , i need to get the worker nodes used from azure cluster for a particular pipeline in ADF.

Suppose i have multiple Pipelines running in different intervals , i need to get the databricks cluster worker nodes already used for a particular pipeline,

or if i can get the pipeline run id against the cluster events in cluster event log.

Thanks
Swati Sharma

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,404 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,436 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,601 Reputation points
    2023-08-07T06:34:28.4833333+00:00

    @swati sharma - Thanks for the question and using MS Q&A platform.

    To get the worker nodes used for a particular pipeline in Azure Data Factory, you can use the monitoring feature in the Azure portal. Here are the steps:

    1. Open the Azure portal and navigate to your Data Factory instance.
    2. Click on the "Monitor & Manage" tile to open the monitoring experience.
    3. In the monitoring view, select the time range for which you want to view the pipeline runs.
    4. Filter the pipeline runs by selecting the pipeline name for which you want to view the worker nodes.
    5. Click on the pipeline run for which you want to view the worker noded.
    6. In the pipeline run details view, click on the "Activity Runs" tab.
    7. Find the Databricks Notebook activity in the list of activity runs and click on it.
    8. In the Databricks Notebook activity details view, click on the "Output" tab.
    9. In the output tab, you should see the worker nodes used for the Databricks cluster.

    If you want to get the pipeline run ID against the cluster events in the cluster event log, you can use the Azure Databricks REST API to query the cluster event log. Here is an example API call:

    POST /api/2.0/clusters/events?cluster_id=<cluster_id>&start_time=<start_time>&end_time=<end_time>&event_type=<event_type>&offset=<offset>&limit=<limit> 
    

    You can replace <cluster_id> with the ID of your Databricks cluster, <start_time> and <end_time> with the time range for which you want to query the event log, and <event_type> with the type of event you want to query (e.g. "run_started", "run_terminated", etc.). The API call will return a list of events that match your query, including the pipeline run ID if it was triggered by a Databricks Notebook activity.

    For more details, refer to the below links:
    https://docs.databricks.com/api/workspace/clusters/events
    https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook#monitor-the-pipeline-run

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.