What is a Databricks SQL warehouse?

This article introduces SQL warehouses (formerly SQL endpoints) and describes how to work with them using the Databricks SQL UI. A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. Compute resources are infrastructure resources that provide processing capabilities in the cloud.

SQL endpoints name changed to SQL warehouses

Databricks changed the name from SQL endpoint to SQL warehouse because it is more than just an API entry point for running SQL commands. A SQL warehouse is a compute resource for all your data warehousing needs, an integral part of the Lakehouse Platform.

Other compute resource types include Azure Databricks clusters. To work with SQL warehouses using the API, see SQL Warehouses APIs 2.0.

Before you begin

When you create your first SQL warehouses, Databricks recommends that you accept the defaults as they appear on the New SQL Warehouse page. But you have many options that you can configure to meet your specific needs. Among those options, you should be aware of:

  • Serverless SQL warehouses (Public Preview), which use compute resources that are managed by Azure Databricks, unlike a classic SQL warehouse or pro SQL warehouse, whose compute resources are in your Azure subscription. serverless SQL warehouses simplify SQL warehouse management and accelerate launch times. For general information about Serverless compute, see Serverless compute. Before you can create serverless SQL warehouses, an Azure Databricks account administrator must enable serverless SQL warehouses for your workspace.
  • Channels, which let you choose whether to use the current SQL warehouse compute version or the preview version. Preview versions let you try out functionality before it becomes the Databricks SQL standard. Take advantage of preview versions to test your queries and dashboards against upcoming changes. Typically, preview versions are promoted to the current version two weeks after initial preview release, but some previews may last longer. You can learn about the features in the latest preview version by reviewing the release notes. Databricks does not recommend using preview versions for production workloads.

Requirements

  • To create and manage a SQL warehouse you must be a workspace admin and have the Databricks SQL entitlement.
  • To manage a SQL warehouse if you are not a workspace admin, you must have Can Manage permission in Databricks SQL.
  • Before you can create a serverless SQL warehouse, an account admin must first enable serverless SQL warehouses for the workspace.
  • You must have adequate vCPU quota in your Azure account. The default vCPU quota is usually adequate to create the starter warehouse that is created automatically on new workspaces. However, your Azure account may not have adequate vCPU quota to scale the starter warehouse or to create additional warehouses. See Required Azure vCPU quota. You can request additional vCPU quota. Your Azure account may have limitations on how much vCPU quota you can request. Contact your Azure account representative for more information.

View SQL warehouses

Click Endpoints Icon SQL Warehouses in the sidebar.

By default, warehouses are sorted by state (running warehouses first), then in alphabetical order. You can reorder the list by clicking the column headings.

To help you get started quickly, such as when you follow the Databricks SQL quickstart, Azure Databricks creates a SQL warehouse called Starter Warehouse automatically. This SQL warehouse is sized Small. You can edit or delete this SQL warehouse.

Note

If serverless is enabled, a starter serverless SQL warehouse is created automatically.

To filter the list of warehouses, enter text in the search box:

To see more about a warehouse’s configuration, including connection details, monitoring, and advanced settings, click the name of the warehouse.

Create a SQL warehouse

You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API.

You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API. For more information about managing serverless SQL warehouses with the API, see Using Warehouses API with serverless SQL warehouses.

By accepting all the defaults on the New SQL Warehouse page, you can create an efficient and high-performing SQL warehouse quickly and easily. You can override those defaults if your workload or environment requires it.

To create a SQL warehouse using the web UI:

  1. Click Endpoints Icon-1 SQL Warehouses in the sidebar.

  2. Click Create SQL Warehouse to open the New SQL Warehouse dialog.

  3. Enter a name for the warehouse.

  4. Accept the default warehouse properties or edit them.

    • Cluster Size represents the number of cluster workers and size of compute resources available to run your queries and dashboards. The default is X-Large. To reduce query latency, increase the size. For details, see Cluster size.

    • Auto Stop determines whether the warehouse stops if it’s idle for the specified number of minutes. Idle SQL warehouses continue to accumulate DBU and cloud instance charges until they are stopped.

      • Pro and classic SQL warehouses: The default is 45 minutes, which is recommended for typical use. The minimum is 10 minutes.
      • Serverless SQL warehouses: The default is 10 minutes, which is recommended for typical use. The minimum is 5 minutes when you use the UI. Note that you can create a serverless SQL warehouse using the SQL warehouses API, in which case you can set the Auto Stop value as low as 1 minute.

      If you switch between a pro or classic SQL warehouse and a serverless SQL warehouse within this editor and have not yet modified the Auto Stop value, Azure Databricks adjusts the value to match the default for the new warehouse type.

    • Scaling sets the minimum and maximum number of clusters over which queries sent to the warehouse are distributed.

      The default is a minimum of one and maximum of one cluster.

      To handle more concurrent users for a given query, increase the cluster count. To learn how Azure Databricks adds clusters to and removes clusters from a warehouse, see Queueing and autoscaling.

  5. (Optional) Configure advanced options.

    If you want to do any of the following, expand Advanced options:

    • Add tags to help monitor SQL warehouse usage.
    • Choose the warehouse type:
      • The serverless SQL warehouse type is the default if serverless is enabled for the workspace. To enable the serverless SQL warehouse type, see enable serverless SQL warehouses. If the serverless SQL warehouse type is not enabled for your workspace, you do not see this option.
      • If serverless is not enabled for the workspace, the classic SQL warehouse type is the default. The classic SQL warehouse type is enabled by default.
    • If Unity Catalog is enabled for the workspace, it is the default for all new warehouses in the workspace. If Unity Catalog is not enabled for your workspace, you do not see this option. For more information about Unity Catalog, see Unity Catalog.
    • Configure the spot instance policy for the SQL warehouse.
    • Use the Preview channel to test upcoming features.

    For details, see Advanced options.

  6. Click Create.

  7. Do one of the following:

    The warehouse is created and started.

You can also create a SQL warehouse with the Databricks Terraform provider and databricks_sql_endpoint.

Advanced options

You can configure the following advanced options by expanding the Advanced options area when you create a SQL warehouse using the New SQL warehouse dialog or edit an existing SQL warehouse using the edit dialog. You can also configure these options using the SQL Warehouses APIs 2.0.

  • Add tags to help monitor SQL warehouse usage.
  • Choose the warehouse type: The serverless SQL warehouse type is the default if serverless is enabled for the workspace. To enable the serverless SQL warehouse type, see enable serverless SQL warehouses. If the serverless SQL warehouse type is not enabled for your workspace, you do not see this option.
    • If serverless is not enabled for the workspace, the classic SQL warehouse type is the default. The classic SQL warehouse type is enabled by default.
    • If Unity Catalog is enabled for the workspace, it is the default for all new warehouses in the workspace. If Unity Catalog is not enabled for your workspace, you do not see this option. For more information about Unity Catalog, see Unity Catalog.
  • Configure the spot instance policy for the SQL warehouse.
  • Use the Preview channel to test upcoming features.

Configure advanced options

Add tags for usage monitoring

Tags allow you to easily monitor the cost of cloud resources used by users and groups in your organization. When you create or edit a SQL warehouse, expand the Advanced options area to specify tags as key-value pairs. Azure Databricks applies these tags to cloud resources.

Switch the SQL warehouse type (Pro, Classic, or Serverless).

Pro and classic SQL warehouses use compute resources in your Azure subscription. Serverless SQL warehouses (Public Preview) use compute resources in the Azure subscription for Azure Databricks. serverless SQL warehouses simplify SQL warehouse configuration and usage and accelerate launch times. The Serverless option is available only if it has been enabled for the workspace.

  • If serverless SQL warehouses are enabled for your workspace, they are the default whenever you create a SQL warehouse. If you prefer a pro or classic SQL warehouse, expand Advanced options and select the appropriate warehouse type. To switch back to Serverless, turn the toggle on.
  • If serverless SQL warehouses are not enabled, the SQL warehouse is always a pro SQL warehouse. In this case, the Serverless option does not appear, and you cannot change the warehouse type. To switch to a classic SQL warehouse, this option must be enabled for the workspace by an admin.

Configure spot instance policy (pro and classic SQL warehouses only)

If serverless SQL warehouses are not enabled for your workspace, or if you turn off the Serverless option in Advanced options, the Spot Instance Policy option appears, with the default value Cost Optimized.

The spot instance policy determines whether workers use only on-demand instances or a combination of on-demand and spot instances. Cost Optimized uses mostly spot instances and one on-demand instance. Reliability Optimized uses only on-demand instances.

Use the preview channel

Channels let you choose whether to use the Current SQL warehouse compute version or the Preview version. A preview version let you try out functionality before it becomes the Databricks SQL standard. Take advantage of the preview channel to test your queries and dashboards against upcoming changes. You can learn about what’s in the latest preview version in the release notes.

Current is the default. To switch a SQL warehouse to the preview channel, expand Advanced options and select Preview.

Important

Databricks recommends against using a preview version for production workloads. Because only admins can view an warehouse’s properties, including its channel, consider indicating that an SQL warehouse uses a preview version in the warehouse’s name so that users do not inadvertently use it for production workloads.

Configure a SQL warehouse to capture data lineage

Important

Data lineage is in Public Preview.

Note

You must be a SQL admin to enable data lineage.

To create a SQL warehouse that captures data lineage, perform the following steps. These steps need to be followed once for each workspace, and will apply to all SQL Endpoints launched on that workspace.

  1. Go to your Azure Databricks landing page, click the icon below the Databricks logo Databricks logo in the sidebar, and select the SQL persona.

  2. Click User Settings Icon Settings at the bottom of the sidebar and select SQL Admin Console.

  3. Click the SQL Warehouse settings tab.

  4. In the Data Access Configuration text box, enter the following configuration:

    spark.databricks.dataLineage.enabled true
    
  5. Click Save changes

  6. Click Endpoints Icon-1 SQL Warehouses in the sidebar.

  7. Click Create SQL Warehouse to open the New SQL Warehouse dialog.

  8. Enter a name for the warehouse.

  9. Click Advanced options and select the Preview channel.

  10. (Optional) Unity Catalog and the lineage service also support serverless SQL warehouses. To create a serverless SQL warehouse, click the Serverless toggle.

  11. Click Create.

Start, stop, or delete a SQL warehouse

  1. Click Warehouse Icon-2 SQL Warehouses in the sidebar.
  2. To stop a running warehouse, click Stop.
  3. To start a stopped warehouse, click Start.
  4. To delete an warehouse, click the vertical ellipsis Vertical Ellipsis in the Actions column, then click Delete.

Edit a SQL warehouse

You can choose to edit a SQL warehouse using the web UI or the SQL Warehouse API.

To edit a SQL warehouse using the web UI:

  1. Click Endpoints Icon-3 SQL Warehouses in the sidebar.

  2. In the Actions column, click the vertical ellipsis Vertical Ellipsis and click Edit.

  3. Edit the warehouse properties.

    For information about each editable property, see Create a SQL warehouse and Configure a SQL warehouse to capture data lineage.

  4. Click Save or Save and restart.

Upgrade a pro or classic SQL warehouse to a serverless SQL warehouse

To learn about serverless SQL warehouse, see Serverless compute. The serverless SQL warehouse option is available only if it has been enabled for the workspace.

Important

Serverless SQL warehouses are available in Public Preview. To start using Databricks SQL Serverless on Azure, please submit your request using this form.

  1. Click Endpoints Icon-3b SQL Warehouses in the sidebar.
  2. Click a pro or classic SQL warehouse.
  3. Click the Edit button.
  4. Click the Serverless toggle to change its type.
  5. Click Save or Save and restart.

Configure SQL warehouse permissions

To configure permissions for a SQL warehouse:

  1. Click Endpoints Icon-4 SQL Warehouses in the sidebar.

  2. Click a warehouse.

  3. Click the Permissions Button button.

    The SQL Warehouse Permissions dialog appears. The warehouse creator and Azure Databricks admins have Can Manage permission by default.

    Add permission

  4. Select a user or group and a permission.

  5. Click Add.

  6. Click Save.

To learn about permission levels, see SQL warehouse access control.

Monitor a SQL warehouse

You can view the number of queries handled by the warehouse and the number of clusters allocated to the warehouse.

  1. Click Endpoints Icon-5 SQL Warehouses in the sidebar.

  2. Click a warehouse.

  3. Click Monitoring.

    A chart showing the number of queries handled by the warehouse and the number of clusters allocated to the warehouse over the last 6 hours displays.

    Click a timescale button at the top right of the chart to change the displayed period. For example, the following screenshot shows these statistics over 7 days:

    Monitor warehouse

    Note

    The Cluster Count can be greater than one only if scaling is enabled and configured.

Cluster size

The table in this section maps SQL warehouse cluster sizes to Azure Databricks cluster driver size and worker counts. The driver size only applies to pro and classic SQL warehouses.

Note

For serverless SQL warehouses, the cluster sizes may in some cases use different instance types than the ones listed in the documentation for pro and classic SQL warehouses for an equivalent cluster size. In general, the price/performance ratio of the cluster sizes for serverless SQL warehouses is similar to those for pro and classic SQL warehouses.

Cluster size Instance type for driver (applies only to pro and classic SQL warehouses) Worker count
2X-Small Standard_E8ds_v4 1
X-Small Standard_E8ds_v4 2
Small Standard_E16ds_v4 4
Medium Standard_E32ds_v4 8
Large Standard_E32ds_v4 16
X-Large Standard_E64ds_v4 32
2X-Large Standard_E64ds_v4 64
3X-Large Standard_E64ds_v4 128
4X-Large Standard_E64ds_v4 256

The instance size of all workers is Standard_E8ds_v4.

Each driver and worker has eight 128 GB Standard LRS managed disks attached. Attached disks are charged hourly.

Required Azure vCPU quota

To start a SQL warehouse, you must have adequate Azure vCPU quota for Standard_E8ds_v4 instances in your Azure account. Use the following guidelines to determine the required vCPU quota:

  • If you have only one or two SQL warehouses, ensure that you have 8 Azure vCPU available for each core in the cluster. This ensures that you have adequate Azure vCPU to account for the re-provisioning of your warehouse that happens roughly every 24 hours. If your SQL warehouses use auto-scaling or multi-cluster load balancing, you may need to increase the multiplier.
  • As the number of SQL warehouses increases, allow for between 4 and 8 Azure vCPU for each core in the cluster. Databricks recommends starting with a larger number and monitoring for stability.
  • Azure vCPUs used by SQL warehouses is in addition to Azure vCPUs used by clusters used by Data Science & Engineering or by non-Databricks workloads.

To request additional Azure vCPU quota, see Standard quota: Increase limits by VM series in the Azure documentation.

Queueing and autoscaling

Azure Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size. Azure Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes as follows:

  • Less than 2 minutes, don’t upscale.
  • 2 to 6 minutes, add 1 cluster.
  • 6 to 12 minutes, add 2 clusters.
  • 12 to 22 minutes, add 3 clusters.

Otherwise, Azure Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.

In addition, an warehouse is always upscaled if a query waits for 5 minutes in the queue.

If the load is low for 15 minutes, Azure Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Azure Databricks keeps 3 clusters.

Query queuing

Azure Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the STARTING state.

Metadata queries (for example, DESCRIBE <table>) and state modifying queries (for example SET) are never queued, unless the warehouse is in the STARTING state.

Query routing

Azure Databricks routes queries as follows:

  • New session: to the cluster with the least load.
  • Existing session: to the cluster that ran the previous query for that session. If that cluster does not have available capacity, the query is routed to the cluster with the least load.