群集 CLI

可以通过将 Databricks 群集 CLI 子命令追加到 databricks clusters 后面来运行这些命令。

databricks clusters -h
Usage: databricks clusters [OPTIONS] COMMAND [ARGS]...

  Utility to interact with Databricks clusters.

Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  create           Creates a Databricks cluster.
    Options:
      --json-file PATH  File containing JSON request to POST to /api/2.0/clusters/create.
      --json JSON       JSON string to POST to /api/2.0/clusters/create.
  delete           Removes a Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  edit             Edits a Databricks cluster.
    Options:
      --json-file PATH  File containing JSON request to POST to /api/2.0/clusters/edit.
      --json JSON       JSON string to POST to /api/2.0/clusters/edit.
  events Gets events for a Spark cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/#/setting/clusters/$CLUSTER_ID/configuration.  [required]
      --start-time TEXT        The start time in epoch milliseconds. If
                               unprovided, returns events starting from the
                               beginning of time.
      --end-time TEXT          The end time in epoch milliseconds. If unprovided,
                               returns events up to the current time
      --order TEXT             The order to list events in; either ASC or DESC.
                               Defaults to DESC (most recent first).
      --event-type TEXT        An event types to filter on (specify multiple event
                               types by passing the --event-type option multiple
                               times). If empty, all event types are returned.
      --offset TEXT            The offset in the result set. Defaults to 0 (no
                               offset). When an offset is specified and the
                               results are requested in descending order, the
                               end_time field is required.
      --limit TEXT             The maximum number of events to include in a page
                               of events. Defaults to 50, and maximum allowed
                               value is 500.
      --output FORMAT          can be "JSON" or "TABLE". Set to TABLE by default.
  get              Retrieves metadata about a cluster.
    Options:
      --cluster-id CLUSTER_ID Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  list             Lists active and recently terminated clusters.
    Options:
      --output FORMAT          JSON or TABLE. Set to TABLE by default.
  list-node-types  Lists node types for a cluster.
  list-zones       Lists zones where clusters can be created.
  permanent-delete Permanently deletes a cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  resize           Resizes a Databricks cluster given its ID.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
      --num-workers INTEGER    Number of workers. [required]
  restart          Restarts a Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  spark-versions   Lists possible Databricks Runtime versions.
  start            Starts a terminated Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.

创建群集

若要显示使用情况文档,请运行 databricks clusters create --help

databricks clusters create --json-file create-cluster.json

create-cluster.json:

{
  "cluster_name": "my-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "Standard_D3_v2",
  "spark_conf": {
    "spark.speculation": true
  },
  "num_workers": 25
}
{
  "cluster_id": "1234-567890-batch123"
}

删除群集

若要显示使用情况文档,请运行 databricks clusters delete --help

databricks clusters delete --cluster-id 1234-567890-batch123

如果成功,则不显示任何输出。

更改群集的配置

若要显示使用情况文档,请运行 databricks clusters edit --help

databricks clusters edit --json-file edit-cluster.json

edit-cluster.json:

{
  "cluster_id": "1234-567890-batch123",
  "num_workers": 10,
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "Standard_D3_v2"
}

如果成功,则不显示任何输出。

列出群集的事件

若要显示使用情况文档,请运行 databricks clusters events --help

databricks clusters events \
--cluster-id 1234-567890-batch123 \
--start-time 1617238800000 \
--end-time 1619485200000 \
--order DESC \
--limit 5 \
--event-type RUNNING \
--output JSON \
| jq .
{
  "events": [
    {
      "cluster_id": "1234-567890-batch123",
      "timestamp": 1619214150232,
      "type": "RUNNING",
      "details": {
        "current_num_workers": 2,
        "target_num_workers": 2
      }
    },
    ...
    {
      "cluster_id": "1234-567890-batch123",
      "timestamp": 1617895221986,
      "type": "RUNNING",
      "details": {
        "current_num_workers": 2,
        "target_num_workers": 2
      }
    }
  ],
  "next_page": {
    "cluster_id": "1234-567890-batch123",
    "start_time": 1617238800000,
    "end_time": 1619485200000,
    "order": "DESC",
    "event_types": [
      "RUNNING"
    ],
    "offset": 5,
    "limit": 5
  },
  "total_count": 11
}

获取有关群集的信息

若要显示使用情况文档,请运行 databricks clusters get --help

databricks clusters get --cluster-id 1234-567890-batch123

或:

databricks clusters get --cluster-name my-cluster
{
  "cluster_id": "1234-567890-batch123",
  "spark_context_id": 3124308392469747564,
  "cluster_name": "my-cluster",
  "spark_version": "7.5.x-scala2.12",
  "spark_conf": {
    "spark.databricks.delta.preview.enabled": "true"
  },
  "node_type_id": "Standard_DS3_v2",
  "driver_node_type_id": "Standard_DS3_v2",
  "spark_env_vars": {
    "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
  },
  "autotermination_minutes": 0,
  "enable_elastic_disk": true,
  "disk_spec": {},
  "cluster_source": "JOB",
  "enable_local_disk_encryption": false,
  "azure_attributes": {
    "first_on_demand": 1,
    "availability": "ON_DEMAND_AZURE",
    "spot_bid_max_price": -1.0
  },
  "instance_source": {
    "node_type_id": "Standard_DS3_v2"
  },
  "driver_instance_source": {
    "node_type_id": "Standard_DS3_v2"
  },
  "state": "TERMINATED",
  "state_message": "",
  "start_time": 1619563745373,
  "terminated_time": 1619563822867,
  "last_state_loss_time": 0,
  "num_workers": 8,
  "default_tags": {
    "Vendor": "Databricks",
    "Creator": "someone@example.com",
    "ClusterName": "my-cluster",
    "ClusterId": "1234-567890-batch123",
    "JobId": "1268284",
    "RunName": "Normal job"
  },
  "creator_user_name": "someone@example.com",
  "termination_reason": {
    "code": "JOB_FINISHED",
    "type": "SUCCESS"
  },
  "init_scripts_safe_mode": false
}

列出有关所有可用群集的信息

若要显示使用情况文档,请运行 databricks clusters list --help

databricks clusters list --output JSON | jq .
{
  "clusters": [
    {
      "cluster_id": "1234-567890-batch123",
      "spark_context_id": 3124308392469747564,
      "cluster_name": "my-cluster",
      "spark_version": "7.5.x-scala2.12",
      "spark_conf": {
        "spark.databricks.delta.preview.enabled": "true"
      },
      "node_type_id": "Standard_DS3_v2",
      "driver_node_type_id": "Standard_DS3_v2",
      "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
      },
      "autotermination_minutes": 0,
      "enable_elastic_disk": true,
      "disk_spec": {},
      "cluster_source": "JOB",
      "enable_local_disk_encryption": false,
      "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1.0
      },
      "instance_source": {
        "node_type_id": "Standard_DS3_v2"
      },
      "driver_instance_source": {
        "node_type_id": "Standard_DS3_v2"
      },
      "state": "TERMINATED",
      "state_message": "",
      "start_time": 1619563745373,
      "terminated_time": 1619563822867,
      "last_state_loss_time": 0,
      "num_workers": 8,
      "default_tags": {
        "Vendor": "Databricks",
        "Creator": "someone@example.com",
        "ClusterName": "my-cluster",
        "ClusterId": "1234-567890-batch123",
        "JobId": "1268284",
        "RunName": "Normal job"
      },
      "creator_user_name": "someone@example.com",
      "termination_reason": {
        "code": "JOB_FINISHED",
        "type": "SUCCESS"
      },
      "init_scripts_safe_mode": false
    },
    ...
  ]
}

列出可用的群集节点类型

若要显示使用情况文档,请运行 databricks clusters list-node-types --help

databricks clusters list-node-types
{
  "node_types": [
    {
      "node_type_id": "Standard_L80s_v2",
      "memory_mb": 655360,
      "num_cores": 80.0,
      "description": "Standard_L80s_v2",
      "instance_type_id": "Standard_L80s_v2",
      "is_deprecated": false,
      "category": "Storage Optimized",
      "support_ebs_volumes": true,
      "support_cluster_tags": true,
      "num_gpus": 0,
      "node_instance_type": {
        "instance_type_id": "Standard_L80s_v2",
        "local_disks": 1,
        "local_disk_size_gb": 800,
        "instance_family": "Standard LSv2 Family vCPUs",
        "local_nvme_disk_size_gb": 1788,
        "local_nvme_disks": 10,
        "swap_size": "10g"
      },
      "is_hidden": false,
      "support_port_forwarding": true,
      "display_order": 0,
      "is_io_cache_enabled": true,
      "node_info": {
        "available_core_quota": 350.0,
        "total_core_quota": 350.0
      }
    },
    ...
  ]
}

列出用于创建群集的可用区域

注意

此命令不适用于 Azure Databricks。

若要显示使用情况文档,请运行 databricks clusters list-zones --help

databricks clusters list-zones

永久删除群集

若要显示使用情况文档,请运行 databricks clusters permanent-delete --help

databricks clusters permanent-delete --cluster-id 1234-567890-batch123

如果成功,则不显示任何输出。

调整群集大小

若要显示使用情况文档,请运行 databricks clusters resize --help

databricks clusters resize --cluster-id 1234-567890-batch123 --num-workers 10

如果成功,则不显示任何输出。

重启群集

若要显示使用情况文档,请运行 databricks clusters restart --help

databricks clusters restart --cluster-id 1234-567890-batch123

如果成功,则不显示任何输出。

列出可用的 Spark 运行时版本

若要显示使用情况文档,请运行 databricks clusters spark-versions --help

databricks clusters spark-versions
{
  "versions": [
    {
      "key": "8.2.x-scala2.12",
      "name": "8.2 (includes Apache Spark 3.1.1, Scala 2.12)"
    },
    ...
  ]
}

启动群集

若要显示使用情况文档,请运行 databricks clusters start --help

databricks clusters start --cluster-id 1234-567890-batch123

如果成功,则不显示任何输出。