MicrosoftML (R package in SQL Server Machine Learning Services)

Applies to: SQL Server 2016 (13.x) and later versions

MicrosoftML is an R package from Microsoft that provides high-performance machine learning algorithms. It includes functions for training and transformations, scoring, text and image analysis, and feature extraction for deriving values from existing data. The package is included in SQL Server Machine Learning Services and SQL Server 2016 R Services and supports high performance on big data, using multicore processing, and fast data streaming. MicrosoftML also includes numerous transformations for text and image processing.

Full reference documentation

The MicrosoftML package is distributed in multiple Microsoft products, but usage is the same whether you get the package in SQL Server or another product. Because the functions are the same, documentation for individual RevoScaleR functions is published to just one location under the R reference. Should any product-specific behaviors exist, discrepancies will be noted in the function help page.

Versions and platforms

The MicrosoftML package is based on R 3.5.2 and available only when you install one of the following Microsoft products or downloads:

Note

Full product release versions are Windows-only in SQL Server 2017. Both Windows and Linux are supported for MicrosoftML in SQL Server 2019.

Package dependencies

Algorithms in MicrosoftML depend on RevoScaleR for:

  • Data source objects. Data consumed by MicrosoftML functions are created using RevoScaleR functions.
  • Remote computing (shifting function execution to a remote SQL Server instance). The RevoScaleR package provides functions for creating and activating a remote compute context for SQL Server.

In most cases, you will load the packages together whenever you are using MicrosoftML.

Functions by category

This section lists the functions by category to give you an idea of how each one is used. You can also use the table of contents to find functions in alphabetical order.

1-Machine learning algorithms

Function name Description
rxFastTrees An implementation of FastRank, an efficient implementation of the MART gradient boosting algorithm.
rxFastForest A random forest and Quantile regression forest implementation using rxFastTrees.
rxLogisticRegression Logistic regression using L-BFGS.
rxOneClassSvm One class support vector machines.
rxNeuralNet Binary, multi-class, and regression neural net.
rxFastLinear Stochastic dual coordinate ascent optimization for linear binary classification and regression.
rxEnsemble Trains a number of models of various kinds to obtain better predictive performance than could be obtained from a single model.

2-Transformation functions

Function name Description
concat Transformation to create a single vector-valued column from multiple columns.
categorical Create indicator vector using categorical transform with dictionary.
categoricalHash Converts the categorical value into an indicator array by hashing.
featurizeText Produces a bag of counts of sequences of consecutive words, called n-grams, from a given corpus of text. It offers language detection, tokenization, stopwords removing, text normalization, and feature generation.
getSentiment Scores natural language text and creates a column that contains probabilities that the sentiments in the text are positive.
ngram allows defining arguments for count-based and hash-based feature extraction.
selectColumns Selects a set of columns to retrain, dropping all others.
selectFeatures Selects features from the specified variables using a specified mode.
loadImage Loads image data.
resizeImage Resizes an image to a specified dimension using a specified resizing method.
extractPixels Extracts the pixel values from an image.
featurizeImage Featurizes an image using a pre-trained deep neural network model.

3-Scoring and training functions

Function name Description
rxPredict.mlModel Runs the scoring library either from SQL Server, using the stored procedure, or from R code enabling real-time scoring to provide much faster prediction performance.
rxFeaturize Transforms data from an input data set to an output data set.
mlModel Provides a summary of a Microsoft R Machine Learning model.

4-Loss functions for classification and regression

Function name Description
expLoss Specifications for exponential classification loss function.
logLoss Specifications for log classification loss function.
hingeLoss Specifications for hinge classification loss function.
smoothHingeLoss Specifications for smooth hinge classification loss function.
poissonLoss Specifications for poisson regression loss function.
squaredLoss Specifications for squared regression loss function.

5-Feature selection functions

Function name Description
minCount Specification for feature selection in count mode.
mutualInformation Specification for feature selection in mutual information mode.

6-Ensemble modeling functions

Function name Description
fastTrees Creates a list containing the function name and arguments to train a Fast Tree model with rxEnsemble.
fastForest Creates a list containing the function name and arguments to train a Fast Forest model with rxEnsemble.
fastLinear Creates a list containing the function name and arguments to train a Fast Linear model with rxEnsemble.
logisticRegression Creates a list containing the function name and arguments to train a Logistic Regression model with rxEnsemble.
oneClassSvm Creates a list containing the function name and arguments to train a OneClassSvm model with rxEnsemble.

7-Neural networking functions

Function name Description
optimizer Specifies optimization algorithms for the rxNeuralNet machine learning algorithm.

8-Package state functions

Function name Description
rxHashEnv An environment object used to store package-wide state.

How to use MicrosoftML

Functions in MicrosoftML are callable in R code encapsulated in stored procedures. Most developers build MicrosoftML solutions locally, and then migrate finished R code to stored procedures as a deployment exercise.

The MicrosoftML package for R is installed "out-of-the-box" in SQL Server 2017.

The package is not loaded by default. As a first step, load the MicrosoftML package, and then load RevoScaleR if you need to use remote compute contexts or related connectivity or data source objects. Then, reference the individual functions you need.

library(microsoftml);
library(RevoScaleR);
logisticRegression(args);

See also