Create a pool
This article describes how to create a pool using the UI. To learn how to use the Databricks CLI to create a pool, see Instance Pools CLI. To learn how to use the REST API to create a pool, see Instance Pools API 2.0.
You must have permission to create a pool; see Pool access control.
Create a pool using the UI
To create a pool using the UI:
- Click Compute in the sidebar.
- Click the Pools tab.
- Click the Create Pool button.
- Specify the pool configuration.
- Click the Create button.
Attach a cluster to a pool
To attach a cluster to a pool using the cluster creation UI, select the pool from the Driver Type or Worker Type dropdown when you configure the cluster. Available pools are listed at the top of each dropdown list. You can use the same pool or different pools for the driver node and worker nodes.
If you use the Clusters API, you must specify
driver_instance_pool_id for the driver node and
instance_pool_id for the worker nodes.
Pool size and auto termination
When you create a pool, in order to control its size, you can set three parameters: minimum idle instances, maximum capacity, and idle instance auto termination.
Minimum Idle Instances
The minimum number of instances the pool keeps idle. These instances do not terminate, regardless of the auto termination settings. If a cluster consumes idle instances from the pool, Azure Databricks provisions additional instances to maintain the minimum.
The maximum number of instances the pool can provision. If set, this value constrains all instances (idle + used). If a cluster using the pool requests more instances than this number during autoscaling, the request fails with an
This configuration is optional. Azure Databricks recommend setting a value only in the following circumstances:
- You have an instance quota you must stay under.
- You want to protect one set of work from impacting another set of work. For example, suppose your instance quota is 100 and you have teams A and B that need to run jobs. You can create pool A with a max 50 and pool B with max 50 so that the two teams share the 100 quota fairly.
- You need to cap cost.
Idle Instance Auto Termination
The time in minutes above the value set in Minimum Idle Instances that instances can be idle before being terminated by the pool.
A pool consists of both idle instances kept ready for new clusters and instances in use by running clusters. All of these instances are of the same instance provider type, selected when creating a pool.
A pool’s instance type cannot be edited. Clusters attached to a pool use the same instance type for the driver and worker nodes. Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.
Azure Databricks always provides one year’s deprecation notice before ceasing support for an instance type.
If your security requirements include compute isolation, select a Standard_F72s_V2 instance as your worker type. These instance types represent isolated virtual machines that consume the entire physical host and provide the necessary level of isolation required to support, for example, US Department of Defense Impact Level 5 (IL5) workloads.
Preloaded Databricks Runtime version
You can speed up cluster launches by selecting a Databricks Runtime version to be loaded on idle instances in the pool. If a user selects that runtime when they create a cluster backed by the pool, that cluster will launch even more quickly than a pool-backed cluster that doesn’t use a preloaded Databricks Runtime version.
Setting this option to None slows down cluster launches, as it causes the Databricks Runtime version to download on demand to idle instances in the pool. When the cluster releases the instances in the pool, the Databricks Runtime version remains cached on those instances. The next cluster creation operation that uses the same Databricks Runtime version might benefit from this caching behavior, but it is not guaranteed.
Pool tags allow you to easily monitor the cost of cloud resources used by various groups in your organization. You can specify tags as key-value pairs when you create a pool, and Azure Databricks applies these tags to cloud resources like VMs and disk volumes, as well as DBU usage reports.
For convenience, Azure Databricks applies three default tags to each pool:
DatabricksInstancePoolCreatorId. You can also add custom tags when you create a pool. You can add up to 41 custom tags.
To add additional tags to the pool, navigate to the Tabs tab at the bottom of the Create Pool page. Click the + Add button, then enter the key-value pair.
Pool-backed clusters inherit default and custom tags from the pool configuration. For detailed information about how pool tags and cluster tags work together, see Monitor usage using cluster, pool, and workspace tags.
Autoscaling local storage
It can often be difficult to estimate how much disk space a particular job will take. To save you from having to estimate how many gigabytes of managed disk to attach to your pool at creation time, Azure Databricks automatically enables autoscaling local storage on all Azure Databricks pools.
With autoscaling local storage, Azure Databricks monitors the amount of free disk space available on your pool’s instances. If an instance runs too low on disk, a new managed disk is attached automatically before it runs out of disk space. Disks are attached up to a limit of 5 TB of total disk space per virtual machine (including the virtual machine’s initial local storage).
The managed disks attached to a virtual machine are detached only when the virtual machine is returned to Azure. That is, managed disks are never detached from a virtual machine as long as it is part of a pool.
To save cost, you can choose to use spot instances by checking the All Spot radio button.
Clusters in the pool will launch with spot instances for all nodes, driver and worker (as opposed to the hybrid on-demand driver and spot instance workers for non-pool clusters).
If spot instances are evicted due to unavailability, on-demand instances do not replace evicted instances.
Delete a pool
Deleting a pool terminates the pool’s idle instances and removes its configuration. To delete a pool, click the icon in the actions on the Pools page. If you delete a pool:
- Running clusters attached to the pool continue to run, but cannot allocate instances during resize or up-scaling.
- Terminated clusters attached to the pool will fail to start.
You cannot undo this action.
Submit and view feedback for