Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions

Artikkeli
08/08/2024

This article describes how to run a CI/CD (continuous integration/continuous deployment) workflow in GitHub with GitHub Actions and a Databricks Asset Bundle. See What are Databricks Asset Bundles?

You can use GitHub Actions along with Databricks CLI bundle commands to automate, customize, and run your CI/CD workflows from within your GitHub repositories.

You can add GitHub Actions YAML files such as the following to your repo’s .github/workflows directory. The following example GitHub Actions YAML file validates, deploys, and runs the specified job in the bundle within a pre-production target named “qa” as defined within a bundle configuration file. This example GitHub Actions YAML file relies on the following:

A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file’s setting working-directory: . (This setting can be omitted if the bundle configuration file is already at the root of the repository.) This bundle configuration file defines an Azure Databricks workflow named my-job and a target named qa. See Databricks Asset Bundle configuration.
A GitHub secret named SP_TOKEN, representing the Azure Databricks access token for an Azure Databricks service principal that is associated with the Azure Databricks workspace to which this bundle is being deployed and run. See Encrypted secrets.

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: "QA deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

The following GitHub Actions YAML file can exist in the same repo as the preceding file. This file validates, deploys, and runs the specified bundle within a production target named “prod” as defined within a bundle configuration file. This example GitHub Actions YAML file relies on the following:

A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file’s setting working-directory: . (This setting can be omitted if the bundle configuration file is already at the root of the repository.). This bundle configuration file defines an Azure Databricks workflow named my-job and a target named prod. See Databricks Asset Bundle configuration.
A GitHub secret named SP_TOKEN, representing the Azure Databricks access token for an Azure Databricks service principal that is associated with the Azure Databricks workspace to which this bundle is being deployed and run. See Encrypted secrets.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Production deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

Jaa

Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions

See also

Palaute

Lisäresursseja