Stack CLI (legacy)

आलेख
03/01/2024

Important

This documentation has been retired and might not be updated.

This information applies to legacy Databricks CLI versions 0.18 and below. Databricks recommends that you use newer Databricks CLI version 0.205 or above instead. See What is the Databricks CLI?. To find your version of the Databricks CLI, run databricks -v.

To migrate from Databricks CLI version 0.18 or below to Databricks CLI version 0.205 or above, see Databricks CLI migration.

Databricks CLI versions 0.205 and above do not support the stack CLI. Databricks recommends that you use the Databricks Terraform provider instead.

Note

The stack CLI requires Databricks CLI 0.8.3 or above.

The stack CLI provides a way to manage a stack of Azure Databricks resources, such as jobs, notebooks, and DBFS files. You can store notebooks and DBFS files locally and create a stack configuration JSON template that defines mappings from your local files to paths in your Azure Databricks workspace, along with configurations of jobs that run the notebooks.

Use the stack CLI with the stack configuration JSON template to deploy and manage your stack.

You run Databricks stack CLI subcommands by appending them to databricks stack.

databricks stack --help

Usage: databricks stack [OPTIONS] COMMAND [ARGS]...

  [Beta] Utility to deploy and download Databricks resource stacks.

Options:
  -v, --version   [VERSION]
  --debug         Debug Mode. Shows full stack trace on error.
  --profile TEXT  CLI connection profile to use. The default profile is
                  "DEFAULT".
  -h, --help      Show this message and exit.

Commands:
  deploy    Deploy a stack of resources given a JSON configuration of the stack
    Usage: databricks stack deploy [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks and DBFS
                        files  [default: False]
  download  Download workspace notebooks of a stack to the local filesystem
            given a JSON stack configuration template.
    Usage: databricks stack download [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks in the
                        local filesystem   [default: False]

Deploy a stack to a workspace

This subcommand deploys a stack. See Stack setup to learn how to set up a stack.

databricks stack deploy ./config.json

Stack configuration JSON template gives an example of config.json.

Download stack notebook changes

This subcommand downloads the notebooks of a stack.

databricks stack download ./config.json

Examples

Stack setup

File structure of an example stack

tree

.
├── notebooks
|   ├── common
|   |   └── notebook.scala
|   └── config
|       ├── environment.scala
|       └── setup.sql
├── lib
|   └── library.jar
└── config.json

This example stack contains a main notebook in notebooks/common/notebook.scala along with configuration notebooks in the notebooks/config folder. There is a JAR library dependency of the stack in lib/library.jar. config.json is the stack configuration JSON template of the stack. This is what is passed into the stack CLI for deployment of the stack.

Stack configuration JSON template

The stack configuration template describes the stack configuration.

cat config.json

{
  "name": "example-stack",
  "resources": [
  {
    "id": "example-workspace-notebook",
    "service": "workspace",
    "properties": {
      "source_path": "notebooks/common/notebook.scala",
      "path": "/Users/example@example.com/dev/notebook",
      "object_type": "NOTEBOOK"
    }
  },
  {
    "id": "example-workspace-config-dir",
    "service": "workspace",
    "properties": {
      "source_path": "notebooks/config",
      "path": "/Users/example@example.com/dev/config",
      "object_type": "DIRECTORY"
    }
  },
  {
    "id": "example-dbfs-library",
    "service": "dbfs",
    "properties": {
      "source_path": "lib/library.jar",
      "path": "dbfs:/tmp/lib/library.jar",
      "is_dir": false
    }
  },
    {
      "id": "example-job",
      "service": "jobs",
      "properties": {
        "name": "Example Stack CLI Job",
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 3
        },
        "timeout_seconds": 7200,
        "max_retries": 1,
        "notebook_task": {
          "notebook_path": "/Users/example@example.com/dev/notebook"
        },
        "libraries": [
          {
            "jar": "dbfs:/tmp/lib/library.jar"
          }
        ]
      }
    }
  ]
}

Each job, workspace notebook, workspace directory, DBFS file, or DBFS directory is defined as a ResourceConfig. Each ResourceConfig that represent a workspace or DBFS asset contains a mapping from the file or directory where it exists locally (source_path) to where it would exist in the workspace or DBFS (path).

Stack configuration template schema outlines the schema for the stack configuration template.

Deploy a stack

You deploy a stack using the databricks stack deploy <configuration-file> command.

databricks stack deploy ./config.json

During stack deployment, the DBFS and workspace assets are uploaded to your Azure Databricks workspace and jobs are created.

At stack deploy time, a StackStatus JSON file for the deployment is saved in the same directory as the stack configuration template with the name, adding deployed immediately before the .json extension: (for example, ./config.deployed.json). This file is used by the Stack CLI to keep track of past deployed resources on your workspace.

Stack status schema outlines the schema of a stack configuration.

Important

Do not attempt to edit or move the stack status file. If you get any errors regarding the stack status file, delete the file and try the deployment again.

./config.deployed.json

{
  "cli_version": "0.8.3",
  "deployed_output": [
    {
      "id": "example-workspace-notebook",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/notebook"
      },
      "service": "workspace"
    },
    {
      "id": "example-workspace-config-dir",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/config"
      },
      "service": "workspace"
    },
    {
      "id": "example-dbfs-library",
      "databricks_id": {
        "path": "dbfs:/tmp/lib/library.jar"
      },
      "service": "dbfs"
    },
    {
      "id": "example-job",
      "databricks_id": {
        "job_id": 123456
      },
      "service": "jobs"
    }
  ],
  "name": "example-stack"
}

Data structures

Stack configuration template schema

StackConfig

These are the outer fields of a stack configuration template. All fields are required.

Field Name	Type	Description
name	`STRING`	The name of the stack.
resources	List of ResourceConfig	An asset in Azure Databricks. Resources are related to three services (REST API namespaces): workspace, jobs, and dbfs.

ResourceConfig

The fields for each ResourceConfig. All fields are required.

Field Name	Type	Description
id	`STRING`	A unique ID for the resource. Uniqueness of ResourceConfig is enforced.
service	ResourceService	The REST API service that the resource operates on. One of: `jobs`, `workspace`, or `dbfs`.
properties	ResourceProperties	Fields in this are different depending the the `ResourceConfig` service.

ResourceProperties

The properties of a resource by ResourceService. The fields are classified as those used or not used in an Azure Databricks REST API. All the fields listed are required.

service	Fields from the REST API used in the Stack CLI	Fields used only in the Stack CLI
workspace	path: `STRING`- Remote workspace paths of notebooks or directories. (Ex. `/Users/example@example.com/notebook`) object_type: Workspace API- Notebook object type. Can only be `NOTEBOOK` or `DIRECTORY`.	source_path: `STRING`- Local source path of workspace notebooks or directories. A relative path to the stack configuration template file or an absolute path in your filesystem.
jobs	Any field in the settings or new_settings structure. The only field not required in the settings or new_settings structure but required for the stack CLI is: name: `STRING`- Name of the job to be deployed. For purposes of not creating too many duplicate jobs, the Stack CLI enforces unique names in stack deployed jobs.	None.
dbfs	path: `STRING`- Matching remote DBFS path. Must start with `dbfs:/`. (ex. `dbfs:/this/is/a/sample/path`) is_dir: `BOOL`- Whether a DBFS path is a directory or a file.	source_path: `STRING`- Local source path of DBFS files or directories. A relative path to the stack config template file or an absolute path in your filesystem.

ResourceService

Each resource belongs to a specific service that aligns with the Databricks REST API. These are the services that are supported by the Stack CLI.

Service	Description
workspace	A workspace notebook or directory.
jobs	An Azure Databricks job.
dbfs	A DBFS file or directory.

Stack status schema

StackStatus

A stack status file is created after a stack is deployed using the CLI. The top-level fields are:

Field Name	Type	Description
name	`STRING`	The name of the stack. This field is the same field as in StackConfig.
cli_version	`STRING`	The version of the Databricks CLI used to deploy the stack.
deployed_resources	List of ResourceStatus	The status of each deployed resource. For each resource defined in StackConfig, a corresponding ResourceStatus is generated here.

ResourceStatus

Field Name	Type	Description
id	`STRING`	A stack-unique ID for the resource.
service	ResourceService	The REST API service that the resource operates on. One of: `jobs`, `workspace`, or `dbfs`.
databricks_id	DatabricksId	The physical ID of the deployed resource. The actual schema depends on the type (service) of the resource.

DatabricksId

A JSON object whose field depends on the service.

Service	Field in JSON	Type	Description
workspace	path	STRING	The absolute path of the notebook or directory in an Azure Databricks workspace. Naming is consistent with the Workspace API.
jobs	job_id	STRING	The job ID as shown in an Azure Databricks workspace. This can be used to update jobs already deployed.
dbfs	path	STRING	The absolute path of the notebook or directory in an Azure Databricks workspace. Naming is consistent with the DBFS API.

इसके माध्यम से साझा किया गया