Een CI/CD-werkstroom uitvoeren met een Databricks Asset Bundle en GitHub Actions

Artikel
08/08/2024

In dit artikel wordt beschreven hoe u een CI/CD-werkstroom (continue integratie/continue implementatie) uitvoert in GitHub met GitHub Actions en een Databricks Asset Bundle. Zie Wat zijn Databricks-assetbundels?

U kunt GitHub Actions samen met Databricks CLI-opdrachten bundle gebruiken om uw CI/CD-werkstromen vanuit uw GitHub-opslagplaatsen te automatiseren, aan te passen en uit te voeren.

U kunt YAML-bestanden van GitHub Actions toevoegen, zoals het volgende aan de map van .github/workflows uw opslagplaats. Het volgende voorbeeld van een YAML-bestand in GitHub Actions valideert, implementeert en voert de opgegeven taak uit in de bundel binnen een preproductiedoel met de naam 'qa' zoals gedefinieerd in een bundelconfiguratiebestand. In dit voorbeeld is het YAML-bestand van GitHub Actions afhankelijk van het volgende:

Een bundelconfiguratiebestand in de hoofdmap van de opslagplaats, die expliciet wordt gedeclareerd via de instelling working-directory: . van het YAML-bestand van GitHub Actions (deze instelling kan worden weggelaten als het bundelconfiguratiebestand zich al in de hoofdmap van de opslagplaats bevindt.) Dit configuratiebestand voor bundel definieert een Azure Databricks-werkstroom met de naam my-job en een doel met de naam qa. Zie de configuratie van Databricks Asset Bundle.
Een GitHub-geheim met de naam SP_TOKEN, dat het Azure Databricks-toegangstoken vertegenwoordigt voor een Azure Databricks-service-principal die is gekoppeld aan de Azure Databricks-werkruimte waaraan deze bundel wordt geïmplementeerd en uitgevoerd. Zie Versleutelde geheimen.

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: "QA deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

Het volgende YAML-bestand van GitHub Actions kan zich in dezelfde opslagplaats bevinden als het voorgaande bestand. Dit bestand valideert, implementeert en voert de opgegeven bundel uit binnen een productiedoel met de naam prod, zoals gedefinieerd in een bundelconfiguratiebestand. In dit voorbeeld is het YAML-bestand van GitHub Actions afhankelijk van het volgende:

Een bundelconfiguratiebestand in de hoofdmap van de opslagplaats, die expliciet wordt gedeclareerd via de instelling working-directory: . van het YAML-bestand van GitHub Actions (deze instelling kan worden weggelaten als het bundelconfiguratiebestand zich al in de hoofdmap van de opslagplaats bevindt.) Dit configuratiebestand voor bundel definieert een Azure Databricks-werkstroom met de naam my-job en een doel met de naam prod. Zie de configuratie van Databricks Asset Bundle.
Een GitHub-geheim met de naam SP_TOKEN, dat het Azure Databricks-toegangstoken vertegenwoordigt voor een Azure Databricks-service-principal die is gekoppeld aan de Azure Databricks-werkruimte waaraan deze bundel wordt geïmplementeerd en uitgevoerd. Zie Versleutelde geheimen.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Production deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

Delen via

Een CI/CD-werkstroom uitvoeren met een Databricks Asset Bundle en GitHub Actions

Zie ook

Feedback

Aanvullende resources