How to schedule a databricks notebook in Devops pipeline

Vinay5 46 Reputation points
2023-10-18T18:30:40.9466667+00:00

I am trying to schedule a databricks notebook in devops pipeline. I am using the below code.


- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: |
      personalAccessToken='dapie7'  
      databricksUrl='https://adb-13946.6.azuredatabricks.net/api/2.0'
      notebookPath='/Users/notebook.py/'
      jobName='ScheduledJobName'

      requestUri="$databricksUrl/jobs/create"

      body='{
        "name": "'$jobName'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2"
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath'"
        },
        "schedule": "@daily",
        "max_retries": 0,
        "timezone_id": "Canada/Eastern",
        "cron_schedule": "45 8 * * *"
      }'

      # Encode PAT to Base64
      patBase64=$(echo -n ":$personalAccessToken" | base64)

      # Make the API request
      curl -X POST -H "Authorization: Basic $patBase64" -H "Content-Type: application/json" -d "$body" "$requestUri"


This is my log.

Starting: Schedule Databricks Notebook
==============================================================================
Task         : Bash
Description  : Run a Bash script on macOS, Linux, or Windows
Version      : 3.229.0
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/d7e8884a-073dfc.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   402  100    54  100   348    277   1789 --:--:-- --:--:-- --:--:--  2072
Finishing: Schedule Databricks Notebook

I do not see the job being created in the databricks. Is there any other way to schedule a job from devops pipeline.

Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,528 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Dillon Silzer 57,831 Reputation points Volunteer Moderator
    2023-10-21T03:18:27.2833333+00:00

    Hi Vinay,

    Quite honestly I would recommend using Azure Data Factory to accomplish this task. You can connect your Notebooks quite easily and pass variables into those notebooks.

    Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory

    https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook

    Then within ADF you can create triggers to run pipelines whenever you desire.


    If this is helpful please accept answer.

    1 person found this answer helpful.
    0 comments No comments

  2. Luis Arias 8,621 Reputation points Volunteer Moderator
    2023-10-18T20:03:54.6033333+00:00

    Hi @Vinay5 ,

    You can schedulle the execution of azure devops pipeline adding on the top of you pipelines the schedule block:

    schedules:
      - cron: "0 7 * * 1"  # 7 hour/ * * Represent day of month and months/ 1 represent Monday 
        displayName: Run each monday at 7
        branches:
          include:
            - main
    
    
    

    More information: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/scheduled-triggers?view=azure-devops&tabs=yaml

    Cheers,

    Luis Arias


    If the information helped address your question, please Accept the answer.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.