How to run Azure Databricks notebooks in parallel using pyspark, and we have to print the failed notebooks in the parallel execution

Question

How to run Azure Databricks notebooks in parallel using pyspark, and we have to print the failed notebooks in the parallel execution

SaiSekhar, MahasivaRavi (Philadelphia) 140

Need to run Data bricks notebooks in Parallel using pyspark, but if failed notebook in the execution, we have to print the failed notebooks

Accepted answer

0 additional answers

Your answer

Answer 1

Hello SaiSekhar, MahasivaRavi (Philadelphia),

From the below documents, you can use the dbutils.notebook.run() function to run multiple notebooks in parallel.

https://www.codesexplorer.com/2020/03/run-databricks-notebooks-in-parallel-python.html

https://learn.microsoft.com/en-us/azure/databricks/notebooks/notebook-workflows

I have copied the code from the above link and modified to print error message when a notebook fails to run.

Please try and let me know.

from concurrent.futures import ThreadPoolExecutor

class NotebookData:
  def __init__(self, path, timeout, parameters=None, retry=0):
    self.path = path
    self.timeout = timeout
    self.parameters = parameters
    self.retry = retry

  def submitNotebook(notebook):
    print("Running notebook %s" % notebook.path)
    try:
      if (notebook.parameters):
        return dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
      else:
        return dbutils.notebook.run(notebook.path, notebook.timeout)
    except Exception as e:
       print(f"Notebook {notebook.path} failed with error: {str(e)}")
       if notebook.retry < 1:
        raise
       print("Retrying notebook %s" % notebook.path)
       notebook.retry = notebook.retry - 1
       submitNotebook(notebook)

def parallelNotebooks(notebooks, numInParallel):

   with ThreadPoolExecutor(max_workers=numInParallel) as ec:
    return [ec.submit(NotebookData.submitNotebook, notebook) for notebook in notebooks]

#Array of instances of NotebookData Class
notebooks = [
NotebookData("../path/to/Notebook1", 1200),
NotebookData("../path/to/Notebook2", 1200, {"Name": "Abhay"}),
NotebookData("../path/to/Notebook3", 1200, retry=2)
]   
      
res = parallelNotebooks(notebooks, 2)
result = [i.result(timeout=3600) for i in res] # This is a blocking call.
print(result)

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-12T18:24:55.0466667+00:00

Hello SaiSekhar, MahasivaRavi (Philadelphia), I am checking to see if you had a chance to look into the above answer. Please let us know if you have any further questions.

Share via

How to run Azure Databricks notebooks in parallel using pyspark, and we have to print the failed notebooks in the parallel execution

0 additional answers

Your answer