Ausführen eines CI/CD-Workflows mit einem Databricks-Ressourcenpaket und GitHub Actions

Artikel
05/03/2024

In diesem Artikel wird beschrieben, wie Sie einen CI/CD-Workflow (Continuous Integration/Continuous Deployment) in GitHub mit GitHub Actions und einem Databricks-Ressourcenpaket ausführen. Weitere Informationen finden Sie unter Was sind Databricks-Ressourcenpakete?

Sie können GitHub Actions zusammen mit bundle-Befehlen der Databricks CLI verwenden, um Ihre CI/CD-Workflows in Ihren GitHub-Repositorys zu automatisieren, anzupassen und auszuführen.

Sie können GitHub Actions YAML-Dateien wie die folgenden zum Verzeichnis .github/workflows Ihres Repositorys hinzufügen. Mit der GitHub Actions-YAML-Datei im folgenden Beispiel wird, wie in einer Paketkonfigurationsdatei definiert, der angegebene Auftrag im Paket überprüft und in einem Präproduktionsziel namens „qa“ bereitgestellt und ausgeführt. Diese GitHub Actions YAML-Beispieldatei basiert auf Folgendem:

Eine Paketkonfigurationsdatei im Stammverzeichnis des Repositorys, die explizit über die Einstellung working-directory: . der GitHub Actions-YAML-Datei deklariert wird. (Diese Einstellung kann weggelassen werden, wenn sich die Paketkonfigurationsdatei bereits im Stammverzeichnis des Repositorys befindet.) In dieser Paketkonfigurationsdatei werden ein Azure Databricks-Workflow namens my-job und ein Ziel namens qa definiert. Weitere Informationen finden Sie unter Konfigurationen für Databricks-Ressourcenpakete.
Ein GitHub-Geheimnis namens SP_TOKEN, das das Azure Databricks-Zugriffstoken für einen Azure Databricks-Dienstprinzipal darstellt, der dem Azure Databricks-Arbeitsbereich zugeordnet ist, in dem dieses Paket bereitgestellt und ausgeführt wird. Weitere Informationen finden Sie unter Verschlüsselte Geheimnisse.

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: "QA deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

Die folgende GitHub Actions-YAML-Datei kann im selben Repository wie die vorherige Datei enthalten sein. Diese Datei dient dazu, das angegebene Paket zu überprüfen und innerhalb eines Produktionsziels namens „prod“ gemäß der Definition in einer Paketkonfigurationsdatei bereitzustellen und auszuführen. Diese GitHub Actions YAML-Beispieldatei basiert auf Folgendem:

Eine Paketkonfigurationsdatei im Stammverzeichnis des Repositorys, die explizit über die Einstellung working-directory: . der GitHub Actions-YAML-Datei deklariert wird. (Diese Einstellung kann weggelassen werden, wenn sich die Paketkonfigurationsdatei bereits im Stammverzeichnis des Repositorys befindet.) In dieser Paketkonfigurationsdatei werden ein Azure Databricks-Workflow namens my-job und ein Ziel namens prod definiert. Weitere Informationen finden Sie unter Konfigurationen für Databricks-Ressourcenpakete.
Ein GitHub-Geheimnis namens SP_TOKEN, das das Azure Databricks-Zugriffstoken für einen Azure Databricks-Dienstprinzipal darstellt, der dem Azure Databricks-Arbeitsbereich zugeordnet ist, in dem dieses Paket bereitgestellt und ausgeführt wird. Weitere Informationen finden Sie unter Verschlüsselte Geheimnisse.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Production deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

Freigeben über

Ausführen eines CI/CD-Workflows mit einem Databricks-Ressourcenpaket und GitHub Actions

Weitere Informationen

Feedback

Feedback

Zusätzliche Ressourcen