Eseguire un flusso di lavoro CI/CD con un bundle di asset databricks e GitHub Actions

Articolo
09/27/2024

Questo articolo descrive come eseguire un flusso di lavoro CI/CD (integrazione continua/distribuzione continua) in GitHub con GitHub Actions e un bundle di asset di Databricks. Vedere Che cosa sono i bundle di asset di Databricks?

È possibile usare GitHub Actions insieme ai comandi bundle dell'interfaccia della riga di comando di Databricks per automatizzare, personalizzare ed eseguire flussi di lavoro CI/CD dall'interno dei repository GitHub.

È possibile aggiungere file YAML di GitHub Actions, ad esempio il codice seguente alla directory del repository .github/workflows. Nell'esempio seguente, il file YAML di GitHub Actions convalida, distribuisce ed esegue il processo specificato nel bundle all'interno di una destinazione di pre-produzione denominata "qa" come definito all'interno di un file di configurazione del bundle. Questo esempio di file YAML di GitHub Actions si basa sugli elementi seguenti:

Un file di configurazione del bundle nella radice del repository, dichiarato in modo esplicito tramite l'impostazione working-directory: . del file YAML di GitHub Actions. (Questa impostazione può essere omessa se il file di configurazione del bundle si trova già nella radice del repository). Questo file di configurazione del bundle definisce un flusso di lavoro di Azure Databricks denominato my-job e una destinazione denominata qa. Vedere Configurazione del bundle di asset di Databricks.
Un segreto GitHub denominato SP_TOKEN, che rappresenta il token di accesso di Azure Databricks per un'entità servizio di Azure Databricks associata all'area di lavoro di Azure Databricks a cui viene distribuito ed eseguito questo bundle. Vedere Segreti crittografati.

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: "QA deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

Il file YAML di GitHub Actions seguente può esistere nello stesso repository del file precedente. Questo file convalida, distribuisce ed esegue il bundle specificato all'interno di una destinazione di produzione denominata "prod" come definito all'interno di un file di configurazione del bundle. Questo esempio di file YAML di GitHub Actions si basa sugli elementi seguenti:

Un file di configurazione del bundle nella radice del repository, dichiarato in modo esplicito tramite l'impostazione working-directory: . del file YAML di GitHub Actions (questa impostazione può essere omessa se il file di configurazione del bundle si trova già nella radice del repository.). Questo file di configurazione del bundle definisce un flusso di lavoro di Azure Databricks denominato my-job e una destinazione denominata prod. Vedere Configurazione del bundle di asset di Databricks.
Un segreto GitHub denominato SP_TOKEN, che rappresenta il token di accesso di Azure Databricks per un'entità servizio di Azure Databricks associata all'area di lavoro di Azure Databricks a cui viene distribuito ed eseguito questo bundle. Vedere Segreti crittografati.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Production deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

Condividi tramite

Eseguire un flusso di lavoro CI/CD con un bundle di asset databricks e GitHub Actions

Vedi anche

Commenti e suggerimenti

Risorse aggiuntive