Apache Spark on Azure Databricks

This article describes the how Apache Spark is related to Azure Databricks and the Azure Databricks Lakehouse Platform.

Apache Spark is at the heart of the Azure Databricks Lakehouse Platform and is the technology powering compute clusters and SQL warehouses on the platform. Azure Databricks is an optimized platform for Apache Spark, providing an efficient and simple platform for running Apache Spark workloads.

What is the relationship of Apache Spark to Azure Databricks?

The Databricks company was founded by the orginal creators of Apache Spark. As an open source software project, Apache Spark has committers from many top companies, including Databricks.

Databricks continues to develop and release features to Apache Spark. The Databricks Runtime includes additional optimizations and proprietary features that build upon and extend Apache Spark, including Photon, an optimized version of Apache Spark rewritten in C++.

How does Apache Spark work on Azure Databricks?

Whenever you deploy a compute cluster or SQL warehouse on Azure Databricks, Apache Spark is configured and deployed to virtual machines. You don’t need to worry about configuring or initializing a Spark context or Spark session, as these are managed for you by Azure Databricks.

Can I use Azure Databricks without using Apache Spark?

Azure Databricks supports a variety of workloads and includes a number of other open source libraries in the Databricks Runtime. Databricks SQL uses Apache Spark under the hood, but end users use standard SQL syntax to create and query database objects.

Databricks Runtime for Machine Learning is optimized for ML workloads, and many data scientists use primary open source libraries like TensorFlow and SciKit Learn while working on Azure Databricks. You can use workflows to schedule arbitrary workloads against compute resources deployed and managed by Azure Databricks.

Why use Apache Spark on Azure Databricks?

The Databricks Lakehouse Platform provides a secure, collaborative environment for developing and deploying enterprise solutions that scale with your business. Databricks employees representative many of the most knowledgeable Apache Spark maintainers and users in the world, and the company continuously develops and releases new optimizations to ensure that user have access to the fastest environment for running Apache Spark.