Configure SQL warehouses
This article explains how to configure and manage SQL warehouses using the Databricks SQL UI.
What is a SQL Warehouse?
A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. Compute resources are infrastructure resources that provide processing capabilities in the cloud.
To navigate to the SQL warehouse dashboard, click SQL Warehouses in the sidebar. By default, warehouses are sorted by state (running warehouses first), then in alphabetical order.
To help you get started quickly, Azure Databricks creates a small SQL warehouse called Starter Warehouse automatically. You can edit or delete this SQL warehouse. If serverless is enabled, a starter serverless SQL warehouse is created automatically.
SQL warehouses have the following requirements:
- To create a SQL warehouse you must be a workspace admin.
- To manage a SQL warehouse you must be a workspace admin or have the Can Manage permission on the SQL warehouse.
Before you can create a serverless SQL warehouse, there may be required steps:
- Before you can create a serverless SQL warehouse, an account admin must first enable serverless SQL warehouses for the workspace.
- You must have adequate vCPU quota in your Azure account. The default vCPU quota is usually adequate to create the starter warehouse that is created automatically on new workspaces. However, your Azure account may not have adequate vCPU quota to scale the starter warehouse or to create additional warehouses. See Required Azure vCPU quota. You can request additional vCPU quota. Your Azure account may have limitations on how much vCPU quota you can request. Contact your Azure account representative for more information.
View SQL warehouses
To navigate to the SQL warehouse dashboard, click SQL Warehouses in the sidebar.
By default, warehouses are sorted by state (running warehouses first), then in alphabetical order.
To help you get started quickly, Azure Databricks creates a SQL warehouse called Starter Warehouse automatically. This SQL warehouse is sized Small. You can edit or delete this SQL warehouse.
If serverless compute is enabled, a starter serverless SQL warehouse is created automatically.
Create a SQL warehouse
You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API.
To create a SQL warehouse with an API, see Using Warehouses API with serverless SQL warehouses.
You can create a SQL warehouse using the Create SQL Warehouse button in the web UI or using the SQL Warehouse API or Terraform. The default settings create an efficient and high-performing SQL warehouse, but you can override any settings to fit your workload needs.
To create a SQL warehouse using the web UI:
- Click SQL Warehouses in the sidebar.
- Click Create SQL Warehouse.
- Enter a Name for the warehouse.
- Accept the default warehouse settings or edit them.
Cluster Size represents the number of clusters and size of compute resources available to run your queries and dashboards. The default is X-Large. To reduce query latency, increase the size. See Cluster size.
Auto Stop determines whether the warehouse stops if it’s idle for the specified number of minutes. Idle SQL warehouses continue to accumulate DBU and cloud instance charges until they are stopped.
- Pro and classic SQL warehouses: The default is 45 minutes, which is recommended for typical use. The minimum is 10 minutes.
- Serverless SQL warehouses: The default is 10 minutes, which is recommended for typical use. The minimum is 5 minutes when you use the UI. Note that you can create a serverless SQL warehouse using the SQL warehouses API, in which case you can set the Auto Stop value as low as 1 minute.
Scaling sets the minimum and maximum number of clusters that will be used for a query. The default is a mininum and a maximum of one cluster. You can increase the maximum clusters if you want to handle more concurrent users for a given query. Azure Databricks recommends a cluster for every 10 concurrent queries. For more information, see Queueing and autoscaling.
In order to maintain optimal performance, Databricks periodically recycles clusters. During a recycle period, you may temporarily see a cluster count that exceeds the maximum as Databricks transitions new workloads to the new cluster and waits to recycle the old cluster until all open workloads have completed.
Type determines the type of warehouse. See What are the available warehouse types? for the list. See What are the warehouse type defaults? for the defaults.
- (Optional) Configure advanced options. See Advanced options.
- Click Create.
- You can then configure warehouse permissions if you’d like.
Your SQL warehouse is now created and started. You can also create a SQL warehouse with Terraform using the databricks_sql_endpoint.
You can configure the following advanced options by expanding the Advanced options area when you create a new SQL warehouse or edit an existing SQL warehouse. You can also configure these options using the SQL Warehouses APIs 2.0.
Tags: Tags allow you to easily monitor the cost of cloud resources used by users and groups in your organization. You specify tags as key-value pairs.
Unity Catalog: If Unity Catalog is enabled for the workspace, it is the default for all new warehouses in the workspace. If Unity Catalog is not enabled for your workspace, you do not see this option. For more information about Unity Catalog, see Unity Catalog.
Channel: Use the Preview channel to test upcoming features. The preview version lets you try out functionality before it becomes the Databricks SQL standard. You can use it to test your queries and dashboards against upcoming changes.
Use the release notes to learn what’s in the latest preview version.
Databricks recommends against using a preview version for production workloads. Because only admins can view a warehouse’s properties, including its channel, consider indicating that an SQL warehouse uses a preview version in the warehouse’s name so that users do not inadvertently use it for production workloads.
Manage SQL warehouses
You can choose to manage a SQL warehouses using the web UI or the SQL Warehouse API.
- To stop a running warehouse, click Stop in the Actions column.
- To start a stopped warehouse, click Start in the Actions column.
- To delete a warehouse, click the kebab menu , then click Delete.
- To edit a warehouse, click the kebab menu then click Edit.
- To add and edit permissions, click the kebab menu then click Permissions. To learn about permission levels, see SQL warehouse access control.
Upgrade a pro or classic SQL warehouse to a serverless SQL warehouse
To learn about serverless SQL warehouse, see Serverless compute.
Serverless SQL warehouses are available in Public Preview.
- Click SQL Warehouses in the sidebar.
- In the Actions column, click the vertical ellipsis then click Upgrade to Serverless.
Monitor a SQL warehouse
You can view the number of queries handled by the warehouse and the number of clusters allocated to the warehouse by clicking the name of a SQL warehouse then the Monitoring tab.
The chart shows the number of queries handled by the warehouse and the number of clusters allocated to the warehouse. Click the timescale buttons above the chart to change the displayed period.
The Cluster Count can be greater than one only if scaling is enabled and configured.
The table in this section maps SQL warehouse cluster sizes to Azure Databricks cluster driver size and worker counts. The driver size only applies to pro and classic SQL warehouses.
For serverless SQL warehouses, the cluster sizes may in some cases use different instance types than the ones listed in the documentation for pro and classic SQL warehouses for an equivalent cluster size. In general, the price/performance ratio of the cluster sizes for serverless SQL warehouses is similar to those for pro and classic SQL warehouses.
|Cluster size||Instance type for driver (applies only to pro and classic SQL warehouses)||Worker count|
|2X-Small||Standard_E8ds_v4||1 x Standard_E8ds_v4|
|X-Small||Standard_E8ds_v4||2 x Standard_E8ds_v4|
|Small||Standard_E16ds_v4||4 x Standard_E8ds_v4|
|Medium||Standard_E32ds_v4||8 x Standard_E8ds_v4|
|Large||Standard_E32ds_v4||16 x Standard_E8ds_v4|
|X-Large||Standard_E64ds_v4||32 x Standard_E8ds_v4|
|2X-Large||Standard_E64ds_v4||64 x Standard_E8ds_v4|
|3X-Large||Standard_E64ds_v4||128 x Standard_E8ds_v4|
|4X-Large||Standard_E64ds_v4||256 x Standard_E8ds_v4|
The instance size of all workers is Standard_E8ds_v4.
Each driver and worker has eight 128 GB Standard LRS managed disks attached. Attached disks are charged hourly.
Required Azure vCPU quota
To start a SQL warehouse, you must have adequate Azure vCPU quota for Standard_E8ds_v4 instances in your Azure account. Use the following guidelines to determine the required vCPU quota:
- If you have only one or two SQL warehouses, ensure that you have 8 Azure vCPU available for each core in the cluster. This ensures that you have adequate Azure vCPU to account for the re-provisioning of your warehouse that happens roughly every 24 hours. If your SQL warehouses use auto-scaling or multi-cluster load balancing, you may need to increase the multiplier.
- As the number of SQL warehouses increases, allow for between 4 and 8 Azure vCPU for each core in the cluster. Databricks recommends starting with a larger number and monitoring for stability.
- Azure vCPUs used by SQL warehouses is in addition to Azure vCPUs used by clusters used by Data Science & Engineering or by non-Databricks workloads.
To request additional Azure vCPU quota, see Standard quota: Increase limits by VM series in the Azure documentation.
Queueing and autoscaling
Azure Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size.
Azure Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes.
- If less than 2 minutes, don’t upscale.
- If 2 to 6 minutes, add 1 cluster.
- If 6 to 12 minutes, add 2 clusters.
- If 12 to 22 minutes, add 3 clusters.
Otherwise, Azure Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.
In addition, a warehouse is always upscaled if a query waits for 5 minutes in the queue.
If the load is low for 15 minutes, Azure Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Azure Databricks keeps 3 clusters.
Azure Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the
Metadata queries (for example,
DESCRIBE <table>) and state modifying queries (for example
SET) are never queued, unless the warehouse is in the
Azure Databricks routes queries as follows:
- New session: to the cluster with the least load.
- Existing session: to the cluster that ran the previous query for that session. If that cluster does not have available capacity, the query is routed to the cluster with the least load.
Prześlij i wyświetl opinię dla