Machine Learning guide for SQL Server Big Data Clusters

Applies to: SQL Server 2019 (15.x)

This article explains how to use SQL Server Big Data Clusters for Machine Learning Scenarios.

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

Introduction to Machine Learning in SQL Server Big Data Clusters

SQL Server Big Data Clusters enables machine learning scenarios and solutions using different technology stacks: SQL Server Machine Learning Services and Apache Spark ML.

SQL Server Big Data Clusters offer Machine Learning capabilities inside the SQL Server engine, using the established SQL Server Machine Learning Services technology stack; enabling a high-performance, in-database Machine Learning inference and scoring scenarios.

For big data-based machine learning scenarios, the usage of HDFS for big data hosting and Apache Spark ML capabilities is more cost-effective, scalable, and powerful.

Machine Learning Scenarios

The machine learning capabilities enable different applications and solutions such as: fraud detection, forecasting, churn, and general classification and regression tasks. Yet, it is important to use the best technology for a scenario.

Aspect SQL Server Machine Learning Services Apache Spark ML
Data placement Leverages tabular data locality on SQL Server. Premium data tier. Scalable Big Data data tier using HDFS; either unstructured, semi-structured, and structured data.
Best for Low latency inference and scoring scenarios 1. Distributed batch training and scoring machine learning models on top of Big Data
2. ETL sinks and large-scale data preparation and featurization for ML
Feeds ML powered BI dashboards, reports, and applications. Low latency required Batch scored data may be promoted to SQL Server to drive ML powered scenarios
Latency Low latency required Higher latency acceptable
Read more Run Python and R scripts with Machine Learning Services on SQL Server Big Data Clusters Introducing Spark Machine Learning on SQL Server Big Data Clusters

Next steps

For more information, see Introducing SQL Server Big Data Clusters.