Share via


Improve compute launch reliability using flexible node types

Important

This feature is in Public Preview.

Classic compute resources in Azure Databricks use flexible node types, which allows your compute resource to fall back to alternative, compatible instance types when your specified instance type is unavailable.

This behavior improves compute launch reliability by reducing capacity failures (stockout errors) during compute launches. For spot instances with fallback, flexible node types can attempt to acquire instances multiple times across different instance types before falling back to on-demand instances. This results in a higher percentage of instances running as spot instead of on-demand, reducing your total compute costs.

How flexible node types works

When you launch a compute resource, your cloud provider sometimes runs out of capacity for your specified instance type. This results in a stockout error:

CLOUD_PROVIDER_RESOURCE_STOCKOUT

While these errors are more common for spot instances, they can occur for on-demand instances as well.

With flexible node types enabled, Azure Databricks automatically generates or uses your specified fallback list of compatible instance types. If your preferred instance type is unavailable, Azure Databricks attempts to acquire these backup instance types instead of failing immediately.

Enable flexible node types in your workspace

Workspace admins can enable flexible node types in their workspace admin settings. When enabled, all new classic compute resources in the workspace will use flexible node types unless explicitly disabled:

  1. As a workspace admin, go to the settings page.
  2. Click the Compute tab.
  3. Toggle the Enable auto flexible node types setting:
    • Enabled: All new classic compute resources automatically use flexible node types unless explicitly disabled.
    • Disabled: Classic compute resources only use flexible node types if you explicitly configure node_type_flexibility in the compute resource configuration.

This workspace-wide setting does not affect existing compute resources. When disabled, users can still configure flexible node types for individual compute resources by explicitly configuring the worker_node_type_flexibility or driver_node_type_flexibility fields with custom fallback lists. To prevent users from configuring these fields, workspace admins can use compute policies. See Flexible node type policy examples.

Specify a custom fallback list

When flexible node types are enabled in your workspace, Azure Databricks automatically generates a fallback list of compatible instance types for new compute resources.

If you don't want to use the automatically generated fallback list, you can specify your own fallback list instead. Additionally, if flexible node types are disabled in your workspace, you can still specify a custom fallback list for your compute resource. Only certain instance types are compatible. See Fallback instance type requirements. For a reference of compatible instance types, see the flexible node type compatibility reference.

Custom fallback lists are only supported when configuring compute using the API. See the Clusters API reference documentation.

For example, the following configuration specifies which instance type the compute resource will fall back to if needed:


  "worker_node_type_flexibility": {
    "alternate_node_type_ids": [
      "Standard_L8s_v2"
    ]
  },
  "driver_node_type_flexibility": {
    "alternate_node_type_ids": [
      "Standard_L8s_v2"
    ]
  },

Fallback instance type requirements

Fallback instance types must be compatible with the compute's preferred instance type. Your list of fallback instance types must meet the following requirements:

  • Same vCPU count and memory as the preferred instance type (fallback instances must have between 100% and 110% of the preferred instance type's memory)
  • Same number of local disks and disk size as the preferred instance type
  • Same CPU architecture as the preferred instance type (all ARM or all x86)
  • Same OS image and Photon support as the preferred instance type
  • No GPU instance types (GPUs are not supported)
  • Maximum of 5 unique fallback instance types
  • All instance types must have consistent storage support: either all support PREMIUM_LRS storage or none of them do.

Use flexible node types with pools

You can also customize a fallback list for pools. In the Pools API, set the node_type_flexibility field to specify the fallback instance types. For example:

"node_type_flexibility": {
    "alternate_node_type_ids": ["Standard_L8s_v2"]
  }

Pools do not support using flexible instance types to maintain the minimum idle count. The pool can only launch VMs using the fallback instance types when a compute launch from the pool is attempted. Pre-warming the minIdle count only uses the preferred instance type.

View the acquired instance types

When using flexible node types, your compute resource may consist of a mix of different instance types. All fallback instance types are compatible with your preferred type, maintaining the same vCPU count, memory, disk layout, CPU architecture, and OS image to ensure your workload runs correctly.

You can view which instance types were acquired for your compute resource:

  1. In the compute details page, click the three dots next to the Terminate button and select View JSON.
  2. Review the node_type_id field for each executor to see which instance types are running.

You can also use the Get clusters info API to retrieve this information programmatically. Additionally, users with permission to access system tables can query the node_timelines table. See Node timeline table schema.

Disable flexible node types on a compute resource

Note

Databricks recommends keeping flexible node types enabled unless you have strict requirements for a specific instance type.

If you would prefer the compute launch to fail rather than fallback to an alternative instance type, you can disable the flexible node behavior at the individual compute-resource level. This is only supported when using the Clusters API. To disable flexible node types, set the flexible node type fields as empty in the compute configuration. For example:

"worker_node_type_flexibility": {
  "alternate_node_type_ids": []
},
"driver_node_type_flexibility": {
  "alternate_node_type_ids": []
}