MicrosoftML (R package in SQL Server Machine Learning Services)
Applies to: SQL Server 2016 (13.x) and later versions
MicrosoftML is an R package from Microsoft that provides high-performance machine learning algorithms. It includes functions for training and transformations, scoring, text and image analysis, and feature extraction for deriving values from existing data. The package is included in SQL Server Machine Learning Services and SQL Server 2016 R Services and supports high performance on big data, using multicore processing, and fast data streaming. MicrosoftML also includes numerous transformations for text and image processing.
Full reference documentation
The MicrosoftML package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. Because the functions are the same, documentation for individual RevoScaleR functions is published to just one location under the R reference. Should any product-specific behaviors exist, discrepancies will be noted in the function help page.
Versions and platforms
The MicrosoftML package is based on R 3.5.2 and available only when you install one of the following Microsoft products or downloads:
Full product release versions are Windows-only in SQL Server 2017. Both Windows and Linux are supported for MicrosoftML in SQL Server 2019.
Algorithms in MicrosoftML depend on RevoScaleR for:
- Data source objects. Data consumed by MicrosoftML functions are created using RevoScaleR functions.
- Remote computing (shifting function execution to a remote SQL Server instance). The RevoScaleR package provides functions for creating and activating a remote compute context for SQL server.
In most cases, you will load the packages together whenever you are using MicrosoftML.
Functions by category
This section lists the functions by category to give you an idea of how each one is used. You can also use the table of contents to find functions in alphabetical order.
1-Machine learning algorithms
|rxFastTrees||An implementation of FastRank, an efficient implementation of the MART gradient boosting algorithm.|
|rxFastForest||A random forest and Quantile regression forest implementation using rxFastTrees.|
|rxLogisticRegression||Logistic regression using L-BFGS.|
|rxOneClassSvm||One class support vector machines.|
|rxNeuralNet||Binary, multi-class, and regression neural net.|
|rxFastLinear||Stochastic dual coordinate ascent optimization for linear binary classification and regression.|
|rxEnsemble||Trains a number of models of various kinds to obtain better predictive performance than could be obtained from a single model.|
|concat||Transformation to create a single vector-valued column from multiple columns.|
|categorical||Create indicator vector using categorical transform with dictionary.|
|categoricalHash||Converts the categorical value into an indicator array by hashing.|
|featurizeText||Produces a bag of counts of sequences of consecutive words, called n-grams, from a given corpus of text. It offers language detection, tokenization, stopwords removing, text normalization, and feature generation.|
|getSentiment||Scores natural language text and creates a column that contains probabilities that the sentiments in the text are positive.|
|ngram||allows defining arguments for count-based and hash-based feature extraction.|
|selectColumns||Selects a set of columns to retrain, dropping all others.|
|selectFeatures||Selects features from the specified variables using a specified mode.|
|loadImage||Loads image data.|
|resizeImage||Resizes an image to a specified dimension using a specified resizing method.|
|extractPixels||Extracts the pixel values from an image.|
|featurizeImage||Featurizes an image using a pre-trained deep neural network model.|
3-Scoring and training functions
|rxPredict.mlModel||Runs the scoring library either from SQL Server, using the stored procedure, or from R code enabling real-time scoring to provide much faster prediction performance.|
|rxFeaturize||Transforms data from an input data set to an output data set.|
|mlModel||Provides a summary of a Microsoft R Machine Learning model.|
4-Loss functions for classification and regression
|expLoss||Specifications for exponential classification loss function.|
|logLoss||Specifications for log classification loss function.|
|hingeLoss||Specifications for hinge classification loss function.|
|smoothHingeLoss||Specifications for smooth hinge classification loss function.|
|poissonLoss||Specifications for poisson regression loss function.|
|squaredLoss||Specifications for squared regression loss function.|
5-Feature selection functions
|minCount||Specification for feature selection in count mode.|
|mutualInformation||Specification for feature selection in mutual information mode.|
6-Ensemble modeling functions
|fastTrees||Creates a list containing the function name and arguments to train a Fast Tree model with rxEnsemble.|
|fastForest||Creates a list containing the function name and arguments to train a Fast Forest model with rxEnsemble.|
|fastLinear||Creates a list containing the function name and arguments to train a Fast Linear model with rxEnsemble.|
|logisticRegression||Creates a list containing the function name and arguments to train a Logistic Regression model with rxEnsemble.|
|oneClassSvm||Creates a list containing the function name and arguments to train a OneClassSvm model with rxEnsemble.|
7-Neural networking functions
|optimizer||Specifies optimization algorithms for the rxNeuralNet machine learning algorithm.|
8-Package state functions
|rxHashEnv||An environment object used to store package-wide state.|
How to use MicrosoftML
Functions in MicrosoftML are callable in R code encapsulated in stored procedures. Most developers build MicrosoftML solutions locally, and then migrate finished R code to stored procedures as a deployment exercise.
The MicrosoftML package for R is installed "out-of-the-box" in SQL Server 2017.
The package is not loaded by default. As a first step, load the MicrosoftML package, and then load RevoScaleR if you need to use remote compute contexts or related connectivity or data source objects. Then, reference the individual functions you need.
library(microsoftml); library(RevoScaleR); logisticRegression(args);