Paket yapılandırma örnekleri

Makale
09/03/2024

Bu makalede Databricks Varlık Paketleri özellikleri ve ortak paket kullanım örnekleri için örnek yapılandırma sağlanır.

İpucu

Bu makaledeki örneklerden bazıları ve diğerleri paket örnekleri deposunda bulunabilir.

Sunucusuz işlem kullanan iş

Databricks Varlık Paketleri sunucusuz işlem üzerinde çalışan işleri destekler. Bunu yapılandırmak için, bir işin ayarını atlayabilir clusters veya aşağıdaki örneklerde gösterildiği gibi bir ortam belirtebilirsiniz.

# A serverless job (no cluster definition)
resources:
  jobs:
    serverless_job_no_cluster:
      name: serverless_job_no_cluster

      email_notifications:
        on_failure:
          - someone@example.com

      tasks:
        - task_key: notebook_task
          notebook_task:
            notebook_path: ../src/notebook.ipynb

# A serverless job (environment spec)
resources:
  jobs:
    serverless_job_environment:
      name: serverless_job_environment

      tasks:
        - task_key: task
          spark_python_task:
            python_file: ../src/main.py

          # The key that references an environment spec in a job.
          environment_key: default

      # A list of task execution environment specifications that can be referenced by tasks of this job.
      environments:
        - environment_key: default

          # Full documentation of this spec can be found at:
          # https://docs.databricks.com/api/workspace/jobs/create#environments-spec
          spec:
            client: "1"
            dependencies:
              - cowsay

Sunucusuz işlem kullanan işlem hattı

Databricks Varlık Paketleri sunucusuz işlem üzerinde çalışan işlem hatlarını destekler. Bunu yapılandırmak için işlem hattı serverless ayarını olarak trueayarlayın. Aşağıdaki örnek yapılandırma, sunucusuz işlem üzerinde çalışan bir işlem hattını ve işlem hattının saatte bir yenilenmesini tetikleyen bir işi tanımlar.

# A pipeline that runs on serverless compute
resources:
  pipelines:
    my_pipeline:
      name: my_pipeline
      target: ${bundle.environment}
      serverless: true
      catalog: users
      libraries:
        - notebook:
            path: ../src/my_pipeline.ipynb

      configuration:
        bundle.sourcePath: /Workspace/${workspace.file_path}/src

# This defines a job to refresh a pipeline that is triggered every hour
resources:
  jobs:
    my_job:
      name: my_job

      # Run this job once an hour.
      trigger:
        periodic:
          interval: 1
          unit: HOURS

      email_notifications:
        on_failure:
          - someone@example.com

      tasks:
        - task_key: refresh_pipeline
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_pipeline.id}

SQL not defteriyle iş

Aşağıdaki örnek yapılandırma, SQL not defteri olan bir işi tanımlar.

resources:
  jobs:
    job_with_sql_notebook:
      name: Job to demonstrate using a SQL notebook with a SQL warehouse

      tasks:
        - task_key: notebook
          notebook_task:
            notebook_path: ./select.sql
            warehouse_id: 799f096837fzzzz4

Birden çok tekerlek dosyası içeren iş

Aşağıdaki örnek yapılandırma, birden çok *.whl dosya içeren bir iş içeren bir paketi tanımlar.

# job.yml
resources:
  jobs:
    example_job:
      name: "Example with multiple wheels"
      tasks:
        - task_key: task

          spark_python_task:
            python_file: ../src/call_wheel.py

          libraries:
            - whl: ../my_custom_wheel1/dist/*.whl
            - whl: ../my_custom_wheel2/dist/*.whl

          new_cluster:
            node_type_id: i3.xlarge
            num_workers: 0
            spark_version: 14.3.x-scala2.12
            spark_conf:
                "spark.databricks.cluster.profile": "singleNode"
                "spark.master": "local[*, 4]"
            custom_tags:
                "ResourceClass": "SingleNode"

# databricks.yml
bundle:
  name: job_with_multiple_wheels

include:
  - ./resources/job.yml

workspace:
  host: https://myworkspace.cloud.databricks.com

artifacts:
  my_custom_wheel1:
    type: whl
    build: poetry build
    path: ./my_custom_wheel1

  my_custom_wheel2:
    type: whl
    build: poetry build
    path: ./my_custom_wheel2

targets:
  dev:
    default: true
    mode: development

requirements.txt dosyası kullanan iş

Aşağıdaki örnek yapılandırma, requirements.txt dosyası kullanan bir işi tanımlar.

resources:
  jobs:
    job_with_requirements_txt:
      name: Example job that uses a requirements.txt file

      tasks:
        - task_key: task
          job_cluster_key: default
          spark_python_task:
            python_file: ../src/main.py
          libraries:
            - requirements: /Workspace/${workspace.file_path}/requirements.txt

Unity Kataloğu'na JAR dosyası yükleyen paket

Jar dosyaları ve tekerlek dosyaları gibi tüm yapıtların Unity Kataloğu birimlerine yüklenmesi için Unity Kataloğu birimlerini yapıt yolu olarak belirtebilirsiniz. Aşağıdaki örnek paket, Unity Kataloğu'na bir JAR dosyası yükler. Eşleme hakkında artifact_path bilgi için bkz . artifact_path.

bundle:
  name: jar-bundle

workspace:
  host: https://myworkspace.cloud.databricks.com
  artifact_path: /Volumes/main/default/my_volume

artifacts:
  my_java_code:
    path: ./sample-java
    build: "javac PrintArgs.java && jar cvfm PrintArgs.jar META-INF/MANIFEST.MF PrintArgs.class"
    files:
      - source: ./sample-java/PrintArgs.jar

resources:
  jobs:
    jar_job:
      name: "Spark Jar Job"
      tasks:
        - task_key: SparkJarTask
          new_cluster:
            num_workers: 1
            spark_version: "14.3.x-scala2.12"
            node_type_id: "i3.xlarge"
          spark_jar_task:
            main_class_name: PrintArgs
          libraries:
            - jar: ./sample-java/PrintArgs.jar

Aracılığıyla paylaş