Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
The SQL Server Big Data Clusters runtime for Apache Spark is a standardized specification for Apache Spark that enables seamless interoperability between distributions. This Spark runtime is a consistent, versioned block of programming language distributions, engine optimizations, core libraries, and packages.
Every product that uses this runtime specification, will contain the same versions of Apache Spark Core, PySpark, Scala Spark, Spark.R, sparklyr, and .NET for Spark.
All the distributed packages and libraries are also the same. One of the primary goals for the specification is to provide a first-class experience for Data Engineers and Data Scientists by providing a constantly curated and updated list of packages and connectors, out-of-the-box.
Benefits of the SQL Server Big Data Clusters runtime for Apache Spark:
The SQL Server Big Data Clusters runtime for Apache Spark specification defines the following:
The runtime naming standard is as follows:
"PRODUCT_NAME.SPARK_MAJOR_VERSION.CALENDAR_YEAR.RELEASE#"
Example is "BDC.3.2021.1".
RELEASE# is a sequential semantic number. It is not bound to months or any other standard. Once a runtime release is created, it is immutable. Each release of SQL Server Big Data Clusters ships with one version of the runtime.
The SQL Server Big Data Clusters platform release notes have the runtime name and complete contents of the release.
For more information, see Introducing SQL Server Big Data Clusters.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Use Apache Spark in Microsoft Fabric - Training
Apache Spark is a core technology for large-scale data analytics. Microsoft Fabric provides support for Spark clusters, enabling you to analyze and process data at scale.
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Documentation
Run Spark jobs: Azure Toolkit for IntelliJ - SQL Server Big Data Clusters
Learn how to submit Spark jobs on SQL Server Big Data Clusters in Azure Toolkit for IntelliJ by submitting a local Jar or Py file.
Submit Spark jobs: Command-line tools - SQL Server Big Data Clusters
Submit Spark jobs on SQL Server Big Data Clusters by using command-line tools.
Spark 3 upgrade guide - SQL Server Big Data Clusters
Spark 3 upgrade guide