Azioni di GitHub

Importante

Questa funzionalità è in Anteprima Pubblica.

GitHub Actions attiva le esecuzioni dei flussi CI/CD dai tuoi repository GitHub e consente di automatizzare la pipeline CI/CD di build, test e distribuzione.

Questa pagina fornisce informazioni su GitHub Actions sviluppato da Databricks ed esempi per casi d'uso comuni. Per informazioni su altre funzionalità e procedure consigliate per CI/CD in Databricks, vedere CI/CD in Azure Databricks e Procedure consigliate e flussi di lavoro CI/CD consigliati in Databricks.

Azioni GitHub di Databricks

Databricks ha sviluppato le azioni github seguenti per i flussi di lavoro CI/CD in GitHub. Aggiungere i file YAML di GitHub Actions alla directory del .github/workflows repository.

Annotazioni

Questo articolo illustra GitHub Actions, sviluppato da terze parti. Per contattare il provider, vedere supporto di GitHub Actions.

Azione di GitHub	Descrizione
databricks/setup-cli	Azione composita che configura la CLI di Databricks in un flusso di lavoro di GitHub Actions.

Eseguire un flusso di lavoro CI/CD che aggiorna una cartella Git

Nell'esempio seguente il file YAML di GitHub Actions aggiorna una cartella Git dell'area di lavoro quando un ramo remoto viene aggiornato. Per informazioni sull'approccio alle cartelle Git per CI/CD, vedere Altri strumenti per il controllo del codice sorgente.

Requisiti

Questo esempio usa la federazione dell'identità del carico di lavoro per GitHub Actions per una sicurezza avanzata e richiede l'aggiunta di un'entità servizio nell'account con criteri di federazione di GitHub Actions. Vedere Abilitare la federazione delle identità del carico di lavoro per GitHub Actions.

Importante

L'oggetto dei criteri di federazione (l'identità del token federato) deve corrispondere esattamente all'oggetto del token previsto. Per questo esempio, il tipo di entità e il nome sono Environment e Prod. L'oggetto costruito deve essere nel formato repo:my-github-org-or-user/my-repo:environment:Prod.

Dopo aver creato un'entità servizio con criteri di federazione, impostare la DATABRICKS_HOST variabile di ambiente sull'area di lavoro host di Azure Databricks e la DATABRICKS_CLIENT_ID variabile di ambiente sull'UUID dell'entità servizio. La DATABRICKS_AUTH_TYPE variabile di ambiente viene impostata nell'azione. Per informazioni sulle variabili di ambiente di Databricks, vedere Variabili di ambiente e campi per l'autenticazione unificata.

Creare l'azione

Aggiungere ora un file .github/workflows/sync_git_folder.yml al repository con il codice YAML seguente:

name: Sync Git Folder

concurrency: prod_environment

on:
  push:
    branches:
      # Set your base branch name here
      - git-folder-cicd-example

permissions:
  # These permissions are required for workload identity federation.
  id-token: write
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    name: 'Update git folder'
    environment: Prod
    env:
      DATABRICKS_AUTH_TYPE: github-oidc
      DATABRICKS_HOST: ${{ vars.DATABRICKS_HOST }}
      DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} # This is the service principal UUID.

    steps:
      - uses: actions/checkout@v3
      - uses: databricks/setup-cli@main
      - name: Update git folder
        # Set your workspace path and branch name here
        run: databricks repos update /Workspace/<git-folder-path> --branch git-folder-cicd-example

Eseguire un flusso di lavoro CI/CD con un pacchetto che esegue l'aggiornamento della pipeline

Nell'esempio seguente il file YAML di GitHub Actions attiva una distribuzione di test che convalida, distribuisce ed esegue il processo specificato nel bundle all'interno di una destinazione di pre-produzione denominata dev come definita all'interno di un file di configurazione del bundle.

Requisiti

Questo esempio richiede che sia presente:

Variabile DATABRICKS_BUNDLE_ENVdi ambiente definita dall'utente.

Un file di configurazione del bundle nella radice del repository, dichiarato in modo esplicito tramite l'impostazione working-directory: . del file YAML di GitHub Actions Questo file di configurazione del bundle deve definire un flusso di lavoro di Azure Databricks denominato e una destinazione denominata sample_jobdev. Per esempio:

# This is a Databricks asset bundle definition for pipeline_update.
bundle:
  name: pipeline_update

include:
  - resources/*.yml

variables:
  catalog:
    description: The catalog to use
  schema:
    description: The schema to use

resources:
  jobs:
    sample_job:
      name: sample_job

      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${var.schema}

      tasks:
        - task_key: refresh_pipeline
          pipeline_task:
            pipeline_id: ${resources.pipelines.sample_pipeline.id}

      environments:
        - environment_key: default
          spec:
            environment_version: '4'

  pipelines:
    sample_pipeline:
      name: sample_pipeline
      catalog: ${var.catalog}
      schema: ${var.schema}
      serverless: true
      root_path: '../src/sample_pipeline'

      libraries:
        - glob:
            include: ../src/sample_pipeline/transformations/**

      environment:
        dependencies:
          - --editable ${workspace.file_path}

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: <dev-workspace-url>
    variables:
      catalog: my_catalog
      schema: ${workspace.current_user.short_name}
  prod:
    mode: production
    workspace:
      host: <production-workspace-url>
      root_path: /Workspace/Users/someone@example.com/.bundle/${bundle.name}/${bundle.target}
    variables:
      catalog: my_catalog
      schema: prod
    permissions:
      - user_name: someone@example.com
        level: CAN_MANAGE

Per altre informazioni sulla configurazione del bundle, vedere Configurazione del bundle di asset di Databricks.

Un segreto GitHub denominato SP_TOKEN, che rappresenta il token di accesso di Azure Databricks per un'entità servizio di Azure Databricks associata all'area di lavoro di Azure Databricks a cui viene distribuito ed eseguito questo bundle. Per creare un token:
1. Creare un'entità servizio Databricks. Vedere Aggiungere entità servizio all'account.
2. Generare un segreto per il servizio principale. Vedere Passaggio 1: Creare un segreto OAuth. Copiare i valori del segreto e dell'ID client.
3. Generare manualmente un token di accesso di Databricks (account o area di lavoro) usando i valori di ID client e segreto copiati. Vedere Generare un token di accesso a livello di account.
4. Copiare il access_token valore dalla risposta JSON. Aggiungere un segreto GitHub denominato SP_TOKEN a Actions nel repository e usare il token di accesso di Databricks come valore del segreto. Vedere Segreti crittografati.
La DATABRICKS_TOKENvariabile di ambiente di autenticazione unificata viene impostata nell'azione sull'oggetto SP_TOKEN configurato.

Creare l'azione

Aggiungere ora un file .github/workflows/pipeline_update.yml al repository con il codice YAML seguente:

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "dev".
name: 'Dev deployment'

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: 'Deploy bundle'
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "dev" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: dev

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: 'Run pipeline update'
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "sample_job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run sample_job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: dev

Potresti anche voler attivare le distribuzioni di produzione. Il file YAML di GitHub Actions seguente può esistere nello stesso repository del file precedente. Questo file convalida, distribuisce ed esegue il bundle specificato all'interno di una destinazione di produzione denominata "prod" come definito all'interno di un file di configurazione del bundle.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: 'Production deployment'

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: 'Deploy bundle'
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: 'Run pipeline update'
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "sample_job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run sample_job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

Eseguire un flusso di lavoro CI/CD che compila un file JAR e distribuisce un bundle

Se si dispone di un ecosistema basato su Java, GitHub Action deve compilare e caricare un file JAR prima di distribuire il bundle. Nell'esempio seguente il file YAML di GitHub Actions attiva una distribuzione che compila e carica un file JAR in un volume, quindi convalida e distribuisce il bundle in una destinazione di produzione denominata "prod" come definito all'interno del file di configurazione del bundle. Compila un file JAR basato su Java, ma i passaggi di compilazione per un progetto basato su Scala sono simili.

Requisiti

Questo esempio richiede che sia presente:

Un file di configurazione del bundle nella radice del repository, dichiarato in modo esplicito tramite l'impostazione del file YAML di GitHub Actions working-directory: .
Variabile DATABRICKS_TOKEN di ambiente che rappresenta il token di accesso di Azure Databricks associato all'area di lavoro di Azure Databricks in cui viene distribuito ed eseguito questo bundle.
Variabile DATABRICKS_HOST di ambiente che rappresenta l'area di lavoro host di Azure Databricks.

Creare l'azione

Aggiungere ora un file .github/workflows/build_jar.yml al repository con il codice YAML seguente:

name: Build JAR and deploy with bundles

on:
  pull_request:
    branches:
      - main
  push:
    branches:
      - main

jobs:
  build-test-upload:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Java
        uses: actions/setup-java@v4
        with:
          java-version: '17' # Specify the Java version used by your project
          distribution: 'temurin' # Use a reliable JDK distribution

      - name: Cache Maven dependencies
        uses: actions/cache@v4
        with:
          path: ~/.m2/repository
          key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
          restore-keys: |
            ${{ runner.os }}-maven-

      - name: Build and test JAR with Maven
        run: mvn clean verify # Use verify to ensure tests are run

      - name: Databricks CLI Setup
        uses: databricks/setup-cli@v0.9.0 # Pin to a specific version

      - name: Upload JAR to a volume
        env:
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} # Add host for clarity
        run: |
          databricks fs cp target/my-app-1.0.jar dbfs:/Volumes/artifacts/my-app-${{ github.sha }}.jar --overwrite

  validate:
    needs: build-test-upload
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Databricks CLI Setup
        uses: databricks/setup-cli@v0.9.0

      - name: Validate bundle
        env:
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
        run: databricks bundle validate

  deploy:
    needs: validate
    if: github.event_name == 'push' && github.ref == 'refs/heads/main' # Only deploy on push to main
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Databricks CLI Setup
        uses: databricks/setup-cli@v0.9.0

      - name: Deploy bundle
        env:
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
        run: databricks bundle deploy --target prod

Risorse aggiuntive

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-02-01

Condividi tramite

Azioni di GitHub

Azioni GitHub di Databricks

Eseguire un flusso di lavoro CI/CD che aggiorna una cartella Git

Requisiti

Creare l'azione

Eseguire un flusso di lavoro CI/CD con un pacchetto che esegue l'aggiornamento della pipeline

Requisiti

Creare l'azione

Eseguire un flusso di lavoro CI/CD che compila un file JAR e distribuisce un bundle

Requisiti

Creare l'azione

Risorse aggiuntive

Commenti e suggerimenti

Risorse aggiuntive