Compute

Azure Databricks compute refers to the selection of computing resources you can provision in your Azure Databricks workspace. Azure Databricks compute includes all-purpose and job compute (also called clusters), instance pools, serverless SQL warehouses, and classic SQL warehouses.

You need compute to run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. You can create and manage your workspace’s compute resources using the Compute section of the workspace:

All-purpose compute page in Databricks workspace

These are the types of compute available in Azure Databricks:

  • All-Purpose compute: Used to analyze data collaboratively using an interactive notebook. You can create, terminate, and restart this compute using the UI, CLI, or REST API.

  • Job compute: Used to run fast and robust automated jobs. The Azure Databricks job scheduler creates a job compute when you run a job on a new compute. The compute terminates when the job is complete. You cannot restart a job compute. See Use Azure Databricks compute with your jobs.

  • Instance pools: Compute with idle, ready-to-use instances, used to reduce start and autoscaling times. You can create this compute using the UI, CLI, or REST API.

  • Serverless SQL warehouses: On-demand elastic compute used to run SQL commands on data objects in the SQL editor or interactive notebooks. You can create SQL warehouses using the UI, CLI, or REST API.

  • Classic SQL warehouses: Provisioned compute used to run SQL commands on data objects in the SQL editor or interactive notebooks. You can create SQL warehouses using the UI, CLI, or REST API.

The articles in this section describe how to work with compute resources using the Azure Databricks UI. For other methods, see Use the command line and the Databricks REST API reference.

Databricks Runtime

Databricks Runtime is the set of core components that run on your compute. Each Databricks Runtime version includes updates that improve the usability, performance, and security of big data analytics. The Databricks Runtime on your compute adds many features, including:

  • Delta Lake, a next-generation storage layer built on top of Apache Spark that provides ACID transactions, optimized layouts and indexes, and execution engine improvements for building data pipelines. See What is Delta Lake?.
  • Installed Java, Scala, Python, and R libraries.
  • Ubuntu and its accompanying system libraries.
  • GPU libraries for GPU-enabled clusters.
  • Azure Databricks services that integrate with other components of the platform, such as notebooks, jobs, and cluster management.

For information about the contents of each runtime version, see the release notes.

Runtime versioning

Databricks Runtime versions are released on a regular basis:

  • Long Term Support versions are represented by an LTS qualifier (for example, 3.5 LTS). For each major release, we declare a “canonical” feature version, for which we provide three full years of support. See Databricks runtime support lifecycles for more information.
  • Major versions are represented by an increment to the version number that precedes the decimal point (the jump from 3.5 to 4.0, for example). They are released when there are major changes, some of which may not be backwards-compatible.
  • Feature versions are represented by an increment to the version number that follows the decimal point (the jump from 3.4 to 3.5, for example). Each major release includes multiple feature releases. Feature releases are always backwards compatible with previous releases within their major release.