Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Data transformations are used to:
The transformations in this guide return classes that implement the IEstimator interface. Data transformations can be chained together. Each transformation both expects and produces data of specific types and formats, which are specified in the linked reference documentation.
Some data transformations require training data to calculate their parameters. For example: the NormalizeMeanVariance transformer calculates the mean and variance of the training data during the Fit()
operation, and uses those parameters in the Transform()
operation.
Other data transformations don't require training data. For example: the ConvertToGrayscale transformation can perform the Transform()
operation without having seen any training data during the Fit()
operation.
Transform | Definition | ONNX Exportable |
---|---|---|
Concatenate | Concatenate one or more input columns into a new output column | Yes |
CopyColumns | Copy and rename one or more input columns | Yes |
DropColumns | Drop one or more input columns | Yes |
SelectColumns | Select one or more columns to keep from the input data | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
NormalizeMeanVariance | Subtract the mean (of the training data) and divide by the variance (of the training data) | Yes |
NormalizeLogMeanVariance | Normalize based on the logarithm of the training data | Yes |
NormalizeLpNorm | Scale input vectors by their lp-norm, where p is 1, 2 or infinity. Defaults to the l2 (Euclidean distance) norm | Yes |
NormalizeGlobalContrast | Scale each value in a row by subtracting the mean of the row data and divide by either the standard deviation or l2-norm (of the row data), and multiply by a configurable scale factor (default 2) | Yes |
NormalizeBinning | Assign the input value to a bin index and divide by the number of bins to produce a float value between 0 and 1. The bin boundaries are calculated to evenly distribute the training data across bins | Yes |
NormalizeSupervisedBinning | Assign the input value to a bin based on its correlation with label column | Yes |
NormalizeMinMax | Scale the input by the difference between the minimum and maximum values in the training data | Yes |
NormalizeRobustScaling | Scale each value using statistics that are robust to outliers that will center the data around 0 and scales the data according to the quantile range. | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
ConvertType | Convert the type of an input column to a new type | Yes |
MapValue | Map values to keys (categories) based on the supplied dictionary of mappings | No |
MapValueToKey | Map values to keys (categories) by creating the mapping from the input data | Yes |
MapKeyToValue | Convert keys back to their original values | Yes |
MapKeyToVector | Convert keys back to vectors of original values | Yes |
MapKeyToBinaryVector | Convert keys back to a binary vector of original values | No |
Hash | Hash the value in the input column | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
FeaturizeText | Transform a text column into a float array of normalized ngrams and char-grams counts | No |
TokenizeIntoWords | Split one or more text columns into individual words | Yes |
TokenizeIntoCharactersAsKeys | Split one or more text columns into individual characters floats over a set of topics | Yes |
NormalizeText | Change case, remove diacritical marks, punctuation marks, and numbers | Yes |
ProduceNgrams | Transform text column into a bag of counts of ngrams (sequences of consecutive words) | Yes |
ProduceWordBags | Transform text column into a bag of counts of ngrams vector | Yes |
ProduceHashedNgrams | Transform text column into a vector of hashed ngram counts | No |
ProduceHashedWordBags | Transform text column into a bag of hashed ngram counts | Yes |
RemoveDefaultStopWords | Remove default stop words for the specified language from input columns | Yes |
RemoveStopWords | Removes specified stop words from input columns | Yes |
LatentDirichletAllocation | Transform a document (represented as a vector of floats) into a vector of floats over a set of topics | Yes |
ApplyWordEmbedding | Convert vectors of text tokens into sentence vectors using a pretrained model | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
ConvertToGrayscale | Convert an image to grayscale | No |
ConvertToImage | Convert a vector of pixels to ImageDataViewType | No |
ExtractPixels | Convert pixels from input image into a vector of numbers | No |
LoadImages | Load images from a folder into memory | No |
LoadRawImageBytes | Loads images of raw bytes into a new column. | No |
ResizeImages | Resize images | No |
DnnFeaturizeImage | Applies a pretrained deep neural network (DNN) model to transform an input image into a feature vector | No |
Transform | Definition | ONNX Exportable |
---|---|---|
OneHotEncoding | Convert one or more text columns into one-hot encoded vectors | Yes |
OneHotHashEncoding | Convert one or more text columns into hash-based one-hot encoded vectors | No |
Transform | Definition | ONNX Exportable |
---|---|---|
DetectAnomalyBySrCnn | Detect anomalies in the input time series data using the Spectral Residual (SR) algorithm | No |
DetectChangePointBySsa | Detect change points in time series data using singular spectrum analysis (SSA) | No |
DetectIidChangePoint | Detect change points in independent and identically distributed (IID) time series data using adaptive kernel density estimations and martingale scores | No |
ForecastBySsa | Forecast time series data using singular spectrum analysis (SSA) | No |
DetectSpikeBySsa | Detect spikes in time series data using singular spectrum analysis (SSA) | No |
DetectIidSpike | Detect spikes in independent and identically distributed (IID) time series data using adaptive kernel density estimations and martingale scores | No |
DetectEntireAnomalyBySrCnn | Detect anomalies for the entire input data using the SRCNN algorithm. | No |
DetectSeasonality | Detect seasonality using fourier analysis. | No |
LocalizeRootCause | Localizes root cause from time series input using a decision tree algorithm. | No |
LocalizeRootCauses | Localizes root causes from tie series input. | No |
Transform | Definition | ONNX Exportable |
---|---|---|
IndicateMissingValues | Create a new boolean output column, the value of which is true when the value in the input column is missing | Yes |
ReplaceMissingValues | Create a new output column, the value of which is set to a default value if the value is missing from the input column, and the input value otherwise | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
SelectFeaturesBasedOnCount | Select features whose non-default values are greater than a threshold | Yes |
SelectFeaturesBasedOnMutualInformation | Select the features on which the data in the label column is most dependent | Yes |
Transform | Definition | ONNX Exportable |
---|---|---|
ApproximatedKernelMap | Map each input vector onto a lower dimensional feature space, where inner products approximate a kernel function, so that the features can be used as inputs to the linear algorithms | No |
ProjectToPrincipalComponents | Reduce the dimensions of the input feature vector by applying the Principal Component Analysis algorithm |
Transform | Definition | ONNX Exportable |
---|---|---|
CalculateFeatureContribution | Calculate contribution scores for each element of a feature vector | No |
Transform | Definition | ONNX Exportable |
---|---|---|
Platt(String, String, String) | Transforms a binary classifier raw score into a class probability using logistic regression with parameters estimated using the training data | Yes |
Platt(Double, Double, String) | Transforms a binary classifier raw score into a class probability using logistic regression with fixed parameters | Yes |
Naive | Transforms a binary classifier raw score into a class probability by assigning scores to bins, and calculating the probability based on the distribution among the bins | Yes |
Isotonic | Transforms a binary classifier raw score into a class probability by assigning scores to bins, where the position of boundaries and the size of bins are estimated using the training data | No |
Transform | Definition | ONNX Exportable |
---|---|---|
ApplyOnnxModel | Transform the input data with an imported ONNX model | No |
LoadTensorFlowModel | Transform the input data with an imported TensorFlow model | No |
Transform | Definition | ONNX Exportable |
---|---|---|
FilterByCustomPredicate | Drops rows where a specified predicate returns true. | No |
FilterByStatefulCustomPredicate | Drops rows where a specified predicate returns true, but allows for a specified state. | No |
CustomMapping | Transform existing columns onto new ones with a user-defined mapping | No |
Expression | Apply an expression to transform columns into new ones | No |
.NET feedback
.NET is an open source project. Select a link to provide feedback:
Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Code-free transformation at scale with Azure Data Factory - Training
Perform code-free transformation at scale with Azure Data Factory or Azure Synapse Pipeline
Certification
Microsoft Certified: Azure Data Scientist Associate - Certifications
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.