NormalizationCatalog.NormalizeRobustScaling 方法

定义

重载

NormalizeRobustScaling(TransformsCatalog, InputOutputColumnPair[], Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

NormalizeRobustScaling(TransformsCatalog, String, String, Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

NormalizeRobustScaling(TransformsCatalog, InputOutputColumnPair[], Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

public static Microsoft.ML.Transforms.NormalizingEstimator NormalizeRobustScaling (this Microsoft.ML.TransformsCatalog catalog, Microsoft.ML.InputOutputColumnPair[] columns, long maximumExampleCount = 1000000000, bool centerData = true, uint quantileMin = 25, uint quantileMax = 75);
static member NormalizeRobustScaling : Microsoft.ML.TransformsCatalog * Microsoft.ML.InputOutputColumnPair[] * int64 * bool * uint32 * uint32 -> Microsoft.ML.Transforms.NormalizingEstimator
<Extension()>
Public Function NormalizeRobustScaling (catalog As TransformsCatalog, columns As InputOutputColumnPair(), Optional maximumExampleCount As Long = 1000000000, Optional centerData As Boolean = true, Optional quantileMin As UInteger = 25, Optional quantileMax As UInteger = 75) As NormalizingEstimator

参数

catalog
TransformsCatalog

转换目录

columns
InputOutputColumnPair[]

输入和输出列对。 输入列必须是数据类型 SingleDouble 或者是这些类型的已知大小的向量。 输出列的数据类型将与关联的输入列相同。

maximumExampleCount
Int64

用于训练规范化器的最大示例数。

centerData
Boolean

是否将数据居中 0 左右,是否删除中间值。 默认为 true。

quantileMin
UInt32

用于缩放数据的分位数最小值。 默认值为 25。

quantileMax
UInt32

用于缩放数据的分位数最大值。 默认值为 75。

返回

示例

using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;

namespace Samples.Dynamic
{
    public class NormalizeBinningMulticolumn
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[4] { 8, 1, 3, 0},
                    Features2 = 1 },

                new DataPoint(){ Features = new float[4] { 6, 2, 2, 0},
                    Features2 = 4 },

                new DataPoint(){ Features = new float[4] { 4, 0, 1, 0},
                    Features2 = 1 },

                new DataPoint(){ Features = new float[4] { 2,-1,-1, 1},
                    Features2 = 2 }
            };
            // Convert training data to IDataView, the general data type used in
            // ML.NET.
            var data = mlContext.Data.LoadFromEnumerable(samples);
            // NormalizeBinning normalizes the data by constructing equidensity bins
            // and produce output based on to which bin the original value belongs.
            var normalize = mlContext.Transforms.NormalizeBinning(new[]{
                new InputOutputColumnPair("Features"),
                new InputOutputColumnPair("Features2"),
                },
                maximumBinCount: 4, fixZero: false);

            // Now we can transform the data and look at the output to confirm the
            // behavior of the estimator. This operation doesn't actually evaluate
            // data until we read the data below.
            var normalizeTransform = normalize.Fit(data);
            var transformedData = normalizeTransform.Transform(data);
            var column = transformedData.GetColumn<float[]>("Features").ToArray();
            var column2 = transformedData.GetColumn<float>("Features2").ToArray();

            for (int i = 0; i < column.Length; i++)
                Console.WriteLine(string.Join(", ", column[i].Select(x => x
                .ToString("f4"))) + "\t\t" + column2[i]);
            // Expected output:
            //
            //  Features                            Feature2
            //  1.0000, 0.6667, 1.0000, 0.0000          0
            //  0.6667, 1.0000, 0.6667, 0.0000          1
            //  0.3333, 0.3333, 0.3333, 0.0000          0
            //  0.0000, 0.0000, 0.0000, 1.0000          0.5
        }

        private class DataPoint
        {
            [VectorType(4)]
            public float[] Features { get; set; }

            public float Features2 { get; set; }
        }
    }
}

适用于

NormalizeRobustScaling(TransformsCatalog, String, String, Int64, Boolean, UInt32, UInt32)

创建一个 NormalizingEstimator,它通过使用可靠到离群值的统计信息进行规范化,方法是将数据居中 0 左右 (删除中间值) ,并根据分位范围缩放数据, (默认为四分位数范围) 。

public static Microsoft.ML.Transforms.NormalizingEstimator NormalizeRobustScaling (this Microsoft.ML.TransformsCatalog catalog, string outputColumnName, string inputColumnName = default, long maximumExampleCount = 1000000000, bool centerData = true, uint quantileMin = 25, uint quantileMax = 75);
static member NormalizeRobustScaling : Microsoft.ML.TransformsCatalog * string * string * int64 * bool * uint32 * uint32 -> Microsoft.ML.Transforms.NormalizingEstimator
<Extension()>
Public Function NormalizeRobustScaling (catalog As TransformsCatalog, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional maximumExampleCount As Long = 1000000000, Optional centerData As Boolean = true, Optional quantileMin As UInteger = 25, Optional quantileMax As UInteger = 75) As NormalizingEstimator

参数

catalog
TransformsCatalog

转换目录

outputColumnName
String

由转换 inputColumnName生成的列的名称。 此列上的数据类型与输入列相同。

inputColumnName
String

要转换的列的名称。 If set to null, the value of the outputColumnName will be used as source. 此列上的数据类型应为SingleDouble或这些类型的已知大小向量。

maximumExampleCount
Int64

用于训练规范化器的最大示例数。

centerData
Boolean

是否通过删除中间值来居中数据约 0。 默认为 true。

quantileMin
UInt32

用于缩放数据的分位数最小值。 默认值为 25。

quantileMax
UInt32

用于缩放数据的分位数最大值。 默认值为 75。

返回

示例

using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;

namespace Samples.Dynamic
{
    public class NormalizeSupervisedBinning
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[4] { 8, 1, 3, 0},
                    Bin ="Bin1" },

                new DataPoint(){ Features = new float[4] { 6, 2, 2, 1},
                    Bin ="Bin2" },

                new DataPoint(){ Features = new float[4] { 5, 3, 0, 2},
                    Bin ="Bin2" },

                new DataPoint(){ Features = new float[4] { 4,-8, 1, 3},
                    Bin ="Bin3" },

                new DataPoint(){ Features = new float[4] { 2,-5,-1, 4},
                    Bin ="Bin3" }
            };
            // Convert training data to IDataView, the general data type used in
            // ML.NET.
            var data = mlContext.Data.LoadFromEnumerable(samples);
            // Let's transform "Bin" column from string to key.
            data = mlContext.Transforms.Conversion.MapValueToKey("Bin").Fit(data)
                .Transform(data);
            // NormalizeSupervisedBinning normalizes the data by constructing bins
            // based on correlation with the label column and produce output based
            // on to which bin original value belong.
            var normalize = mlContext.Transforms.NormalizeSupervisedBinning(
                "Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1,
                fixZero: false);

            // NormalizeSupervisedBinning normalizes the data by constructing bins
            // based on correlation with the label column and produce output based
            // on to which bin original value belong but make sure zero values would
            // remain zero after normalization. Helps preserve sparsity.
            var normalizeFixZero = mlContext.Transforms.NormalizeSupervisedBinning(
                "Features", labelColumnName: "Bin", mininimumExamplesPerBin: 1,
                fixZero: true);

            // Now we can transform the data and look at the output to confirm the
            // behavior of the estimator. This operation doesn't actually evaluate
            // data until we read the data below.
            var normalizeTransform = normalize.Fit(data);
            var transformedData = normalizeTransform.Transform(data);
            var normalizeFixZeroTransform = normalizeFixZero.Fit(data);
            var fixZeroData = normalizeFixZeroTransform.Transform(data);
            var column = transformedData.GetColumn<float[]>("Features").ToArray();
            foreach (var row in column)
                Console.WriteLine(string.Join(", ", row.Select(x => x.ToString(
                    "f4"))));
            // Expected output:
            //  1.0000, 0.5000, 1.0000, 0.0000
            //  0.5000, 1.0000, 0.0000, 0.5000
            //  0.5000, 1.0000, 0.0000, 0.5000
            //  0.0000, 0.0000, 0.0000, 1.0000
            //  0.0000, 0.0000, 0.0000, 1.0000

            var columnFixZero = fixZeroData.GetColumn<float[]>("Features")
                .ToArray();

            foreach (var row in columnFixZero)
                Console.WriteLine(string.Join(", ", row.Select(x => x.ToString(
                    "f4"))));
            // Expected output:
            //  1.0000, 0.0000, 1.0000, 0.0000
            //  0.5000, 0.5000, 0.0000, 0.5000
            //  0.5000, 0.5000, 0.0000, 0.5000
            //  0.0000,-0.5000, 0.0000, 1.0000
            //  0.0000,-0.5000, 0.0000, 1.0000

            // Let's get transformation parameters. Since we work with only one
            // column we need to pass 0 as parameter for
            // GetNormalizerModelParameters.
            // If we have multiple columns transformations we need to pass index of
            // InputOutputColumnPair.
            var transformParams = normalizeTransform.GetNormalizerModelParameters(0)
                as BinNormalizerModelParameters<ImmutableArray<float>>;

            Console.WriteLine($"The 1-index value in resulting array would be " +
                $"produce by:");

            Console.WriteLine("y = (Index(x) / " + transformParams.Density[0] +
                ") - " + (transformParams.Offset.Length == 0 ? 0 : transformParams
                .Offset[0]));

            Console.WriteLine("Where Index(x) is the index of the bin to which " +
                "x belongs");

            Console.WriteLine("Bins upper borders are: " + string.Join(" ",
                transformParams.UpperBounds[0]));
            // Expected output:
            //  The 1-index value in resulting array would be produce by:
            //  y = (Index(x) / 2) - 0
            //  Where Index(x) is the index of the bin to which x belongs
            //  Bins upper bounds are: 4.5 7 ∞

            var fixZeroParams = normalizeFixZeroTransform
                .GetNormalizerModelParameters(0) as BinNormalizerModelParameters<
                ImmutableArray<float>>;

            Console.WriteLine($"The 1-index value in resulting array would be " +
                $"produce by:");

            Console.WriteLine(" y = (Index(x) / " + fixZeroParams.Density[1] +
                ") - " + (fixZeroParams.Offset.Length == 0 ? 0 : fixZeroParams
                .Offset[1]));

            Console.WriteLine("Where Index(x) is the index of the bin to which x " +
                "belongs");

            Console.WriteLine("Bins upper borders are: " + string.Join(" ",
                fixZeroParams.UpperBounds[1]));
            // Expected output:
            //  The 1-index value in resulting array would be produce by:
            //  y = (Index(x) / 2) - 0.5
            //  Where Index(x) is the index of the bin to which x belongs
            //  Bins upper bounds are: -2 1.5 ∞
        }

        private class DataPoint
        {
            [VectorType(4)]
            public float[] Features { get; set; }

            public string Bin { get; set; }
        }
    }
}

适用于