Machine Learning guide for SQL Server Big Data Clusters
Applies to: SQL Server 2019 (15.x)
This article explains how to use SQL Server Big Data Clusters for Machine Learning Scenarios.
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
Introduction to Machine Learning in SQL Server Big Data Clusters
SQL Server Big Data Clusters enables machine learning scenarios and solutions using different technology stacks: SQL Server Machine Learning Services and Apache Spark ML.
SQL Server Big Data Clusters offer Machine Learning capabilities inside the SQL Server engine, using the established SQL Server Machine Learning Services technology stack; enabling a high-performance, in-database Machine Learning inference and scoring scenarios.
For big data-based machine learning scenarios, the usage of HDFS for big data hosting and Apache Spark ML capabilities is more cost-effective, scalable, and powerful.
Machine Learning Scenarios
The machine learning capabilities enable different applications and solutions such as: fraud detection, forecasting, churn, and general classification and regression tasks. Yet, it is important to use the best technology for a scenario.
Aspect | SQL Server Machine Learning Services | Apache Spark ML |
---|---|---|
Data placement | Leverages tabular data locality on SQL Server. Premium data tier. | Scalable Big Data data tier using HDFS; either unstructured, semi-structured, and structured data. |
Best for | Low latency inference and scoring scenarios | 1. Distributed batch training and scoring machine learning models on top of Big Data 2. ETL sinks and large-scale data preparation and featurization for ML |
Feeds | ML powered BI dashboards, reports, and applications. Low latency required | Batch scored data may be promoted to SQL Server to drive ML powered scenarios |
Latency | Low latency required | Higher latency acceptable |
Read more | Run Python and R scripts with Machine Learning Services on SQL Server Big Data Clusters | Introducing Spark Machine Learning on SQL Server Big Data Clusters |
Next steps
For more information, see Introducing SQL Server Big Data Clusters.