SQL Server Big Data Clusters runtime for Apache Spark Guide
Applies to: SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
Introducing the SQL Server Big Data Clusters runtime for Apache Spark
The SQL Server Big Data Clusters runtime for Apache Spark is a standardized specification for Apache Spark that enables seamless interoperability between distributions. This Spark runtime is a consistent, versioned block of programming language distributions, engine optimizations, core libraries, and packages.
Every product that uses this runtime specification, will contain the same versions of Apache Spark Core, PySpark, Scala Spark, Spark.R, sparklyr, and .NET for Spark.
All the distributed packages and libraries are also the same. One of the primary goals for the specification is to provide a first-class experience for Data Engineers and Data Scientists by providing a constantly curated and updated list of packages and connectors, out-of-the-box.
Benefits of the SQL Server Big Data Clusters runtime for Apache Spark:
- Spark engine optimizations and features available on all products and services
- Established release cadence
- Seamless interoperability between Spark products and services
- Curated packages for Data Engineers and Data Scientists
- Consistent package management story
Release cadence and naming standards
The SQL Server Big Data Clusters runtime for Apache Spark specification defines the following:
The runtime naming standard is as follows:
"PRODUCT_NAME.SPARK_MAJOR_VERSION.CALENDAR_YEAR.RELEASE#"
Example is "BDC.3.2021.1".
RELEASE# is a sequential semantic number. It is not bound to months or any other standard. Once a runtime release is created, it is immutable. Each release of SQL Server Big Data Clusters ships with one version of the runtime.
What's in the current runtime release?
The SQL Server Big Data Clusters platform release notes have the runtime name and complete contents of the release.
Next steps
For more information, see Introducing SQL Server Big Data Clusters.