NormalizationCatalog.NormalizeRobustScaling 方法



NormalizeRobustScaling(TransformsCatalog, InputOutputColumnPair[], Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

NormalizeRobustScaling(TransformsCatalog, String, String, Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

NormalizeRobustScaling(TransformsCatalog, InputOutputColumnPair[], Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

public static Microsoft.ML.Transforms.NormalizingEstimator NormalizeRobustScaling (this Microsoft.ML.TransformsCatalog catalog, Microsoft.ML.InputOutputColumnPair[] columns, long maximumExampleCount = 1000000000, bool centerData = true, uint quantileMin = 25, uint quantileMax = 75);
static member NormalizeRobustScaling : Microsoft.ML.TransformsCatalog * Microsoft.ML.InputOutputColumnPair[] * int64 * bool * uint32 * uint32 -> Microsoft.ML.Transforms.NormalizingEstimator
Public Function NormalizeRobustScaling (catalog As TransformsCatalog, columns As InputOutputColumnPair(), Optional maximumExampleCount As Long = 1000000000, Optional centerData As Boolean = true, Optional quantileMin As UInteger = 25, Optional quantileMax As UInteger = 75) As NormalizingEstimator





输入和输出列对。 输入列必须是数据类型 SingleDouble 或者是这些类型的已知大小的向量。 输出列的数据类型将与关联的输入列相同。




是否将数据居中 0 左右,是否删除中间值。 默认为 true。


用于缩放数据的分位数最小值。 默认值为 25。


用于缩放数据的分位数最大值。 默认值为 75。



using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;

namespace Samples.Dynamic
    public class NormalizeBinningMulticolumn
        public static void Example()
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();
            var samples = new List<DataPoint>()
                new DataPoint(){ Features = new float[4] { 8, 1, 3, 0},
                    Features2 = 1 },

                new DataPoint(){ Features = new float[4] { 6, 2, 2, 0},
                    Features2 = 4 },

                new DataPoint(){ Features = new float[4] { 4, 0, 1, 0},
                    Features2 = 1 },

                new DataPoint(){ Features = new float[4] { 2,-1,-1, 1},
                    Features2 = 2 }
            // Convert training data to IDataView, the general data type used in
            // ML.NET.
            var data = mlContext.Data.LoadFromEnumerable(samples);
            // NormalizeBinning normalizes the data by constructing equidensity bins
            // and produce output based on to which bin the original value belongs.
            var normalize = mlContext.Transforms.NormalizeBinning(new[]{
                new InputOutputColumnPair("Features"),
                new InputOutputColumnPair("Features2"),
                maximumBinCount: 4, fixZero: false);

            // Now we can transform the data and look at the output to confirm the
            // behavior of the estimator. This operation doesn't actually evaluate
            // data until we read the data below.
            var normalizeTransform = normalize.Fit(data);
            var transformedData = normalizeTransform.Transform(data);
            var column = transformedData.GetColumn<float[]>("Features").ToArray();
            var column2 = transformedData.GetColumn<float>("Features2").ToArray();

            for (int i = 0; i < column.Length; i++)
                Console.WriteLine(string.Join(", ", column[i].Select(x => x
                .ToString("f4"))) + "\t\t" + column2[i]);
            // Expected output:
            //  Features                            Feature2
            //  1.0000, 0.6667, 1.0000, 0.0000          0
            //  0.6667, 1.0000, 0.6667, 0.0000          1
            //  0.3333, 0.3333, 0.3333, 0.0000          0
            //  0.0000, 0.0000, 0.0000, 1.0000          0.5

        private class DataPoint
            public float[] Features { get; set; }

            public float Features2 { get; set; }


NormalizeRobustScaling(TransformsCatalog, String, String, Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

public static Microsoft.ML.Transforms.NormalizingEstimator NormalizeRobustScaling (this Microsoft.ML.TransformsCatalog catalog, string outputColumnName, string inputColumnName = default, long maximumExampleCount = 1000000000, bool centerData = true, uint quantileMin = 25, uint quantileMax = 75);
static member NormalizeRobustScaling : Microsoft.ML.TransformsCatalog * string * string * int64 * bool * uint32 * uint32 -> Microsoft.ML.Transforms.NormalizingEstimator
Public Function NormalizeRobustScaling (catalog As TransformsCatalog, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional maximumExampleCount As Long = 1000000000, Optional centerData As Boolean = true, Optional quantileMin As UInteger = 25, Optional quantileMax As UInteger = 75) As NormalizingEstimator





由转换 inputColumnName生成的列的名称。 此列上的数据类型与输入列相同。


要转换的列的名称。 If set to null, the value of the outputColumnName will be used as source. 此列上的数据类型应为SingleDouble或这些类型的已知大小向量。




是否通过删除中间值来居中数据约 0。 默认为 true。


用于缩放数据的分位数最小值。 默认值为 25。


用于缩放数据的分位数最大值。 默认值为 75。



using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;

namespace Samples.Dynamic
    public class NormalizeSupervisedBinning
        public static void Example()
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();
            var samples = new List<DataPoint>()
                new DataPoint(){ Features = new float[4] { 8, 1, 3, 0},
                    Bin ="Bin1" },

                new DataPoint(){ Features = new float[4] { 6, 2, 2, 1},
                    Bin ="Bin2" },

                new DataPoint(){ Features = new float[4] { 5, 3, 0, 2},
                    Bin ="Bin2" },

                new DataPoint(){ Features = new float[4] { 4,-8, 1, 3},
                    Bin ="Bin3" },

                new DataPoint(){ Features = new float[4] { 2,-5,-1, 4},
                    Bin ="Bin3" }
            // Convert training data to IDataView, the general data type used in
            // ML.NET.
            var data = mlContext.Data.LoadFromEnumerable(samples);
            // Let's transform "Bin" column from string to key.
            data = mlContext.Transforms.Conversion.MapValueToKey("Bin").Fit(data)
            // NormalizeSupervisedBinning normalizes the data by constructing bins
            // based on correlation with the label column and produce output based
            // on to which bin original value belong.
            var normalize = mlContext.Transforms.NormalizeSupervisedBinning(
                "Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1,
                fixZero: false);

            // NormalizeSupervisedBinning normalizes the data by constructing bins
            // based on correlation with the label column and produce output based
            // on to which bin original value belong but make sure zero values would
            // remain zero after normalization. Helps preserve sparsity.
            var normalizeFixZero = mlContext.Transforms.NormalizeSupervisedBinning(
                "Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1,
                fixZero: true);

            // Now we can transform the data and look at the output to confirm the
            // behavior of the estimator. This operation doesn't actually evaluate
            // data until we read the data below.
            var normalizeTransform = normalize.Fit(data);
            var transformedData = normalizeTransform.Transform(data);
            var normalizeFixZeroTransform = normalizeFixZero.Fit(data);
            var fixZeroData = normalizeFixZeroTransform.Transform(data);
            var column = transformedData.GetColumn<float[]>("Features").ToArray();
            foreach (var row in column)
                Console.WriteLine(string.Join(", ", row.Select(x => x.ToString(
            // Expected output:
            //  1.0000, 0.5000, 1.0000, 0.0000
            //  0.5000, 1.0000, 0.0000, 0.5000
            //  0.5000, 1.0000, 0.0000, 0.5000
            //  0.0000, 0.0000, 0.0000, 1.0000
            //  0.0000, 0.0000, 0.0000, 1.0000

            var columnFixZero = fixZeroData.GetColumn<float[]>("Features")

            foreach (var row in columnFixZero)
                Console.WriteLine(string.Join(", ", row.Select(x => x.ToString(
            // Expected output:
            //  1.0000, 0.0000, 1.0000, 0.0000
            //  0.5000, 0.5000, 0.0000, 0.5000
            //  0.5000, 0.5000, 0.0000, 0.5000
            //  0.0000,-0.5000, 0.0000, 1.0000
            //  0.0000,-0.5000, 0.0000, 1.0000

            // Let's get transformation parameters. Since we work with only one
            // column we need to pass 0 as parameter for
            // GetNormalizerModelParameters.
            // If we have multiple columns transformations we need to pass index of
            // InputOutputColumnPair.
            var transformParams = normalizeTransform.GetNormalizerModelParameters(0)
                as BinNormalizerModelParameters<ImmutableArray<float>>;

            Console.WriteLine($"The 1-index value in resulting array would be " +
                $"produce by:");

            Console.WriteLine("y = (Index(x) / " + transformParams.Density[0] +
                ") - " + (transformParams.Offset.Length == 0 ? 0 : transformParams

            Console.WriteLine("Where Index(x) is the index of the bin to which " +
                "x belongs");

            Console.WriteLine("Bins upper borders are: " + string.Join(" ",
            // Expected output:
            //  The 1-index value in resulting array would be produce by:
            //  y = (Index(x) / 2) - 0
            //  Where Index(x) is the index of the bin to which x belongs
            //  Bins upper bounds are: 4.5 7 ∞

            var fixZeroParams = normalizeFixZeroTransform
                .GetNormalizerModelParameters(0) as BinNormalizerModelParameters<

            Console.WriteLine($"The 1-index value in resulting array would be " +
                $"produce by:");

            Console.WriteLine(" y = (Index(x) / " + fixZeroParams.Density[1] +
                ") - " + (fixZeroParams.Offset.Length == 0 ? 0 : fixZeroParams

            Console.WriteLine("Where Index(x) is the index of the bin to which x " +

            Console.WriteLine("Bins upper borders are: " + string.Join(" ",
            // Expected output:
            //  The 1-index value in resulting array would be produce by:
            //  y = (Index(x) / 2) - 0.5
            //  Where Index(x) is the index of the bin to which x belongs
            //  Bins upper bounds are: -2 1.5 ∞

        private class DataPoint
            public float[] Features { get; set; }

            public string Bin { get; set; }
