ExtensionsCatalog.ReplaceMissingValues 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
重载
ReplaceMissingValues(TransformsCatalog, InputOutputColumnPair[], MissingValueReplacingEstimator+ReplacementMode, Boolean) |
创建一个 ColumnCopyingEstimator,它将数据从指定的 InputColumnName 列复制到新列中: OutputColumnName 并替换其中缺失的值,并按该 |
ReplaceMissingValues(TransformsCatalog, String, String, MissingValueReplacingEstimator+ReplacementMode, Boolean) |
创建一个 MissingValueReplacingEstimator,它将数据从指定的 |
ReplaceMissingValues(TransformsCatalog, InputOutputColumnPair[], MissingValueReplacingEstimator+ReplacementMode, Boolean)
创建一个 ColumnCopyingEstimator,它将数据从指定的 InputColumnName 列复制到新列中: OutputColumnName 并替换其中缺失的值,并按该 replacementMode
列替换该列中的缺失值。
public static Microsoft.ML.Transforms.MissingValueReplacingEstimator ReplaceMissingValues (this Microsoft.ML.TransformsCatalog catalog, Microsoft.ML.InputOutputColumnPair[] columns, Microsoft.ML.Transforms.MissingValueReplacingEstimator.ReplacementMode replacementMode = Microsoft.ML.Transforms.MissingValueReplacingEstimator+ReplacementMode.DefaultValue, bool imputeBySlot = true);
static member ReplaceMissingValues : Microsoft.ML.TransformsCatalog * Microsoft.ML.InputOutputColumnPair[] * Microsoft.ML.Transforms.MissingValueReplacingEstimator.ReplacementMode * bool -> Microsoft.ML.Transforms.MissingValueReplacingEstimator
<Extension()>
Public Function ReplaceMissingValues (catalog As TransformsCatalog, columns As InputOutputColumnPair(), Optional replacementMode As MissingValueReplacingEstimator.ReplacementMode = Microsoft.ML.Transforms.MissingValueReplacingEstimator+ReplacementMode.DefaultValue, Optional imputeBySlot As Boolean = true) As MissingValueReplacingEstimator
参数
- catalog
- TransformsCatalog
转换的目录。
- columns
- InputOutputColumnPair[]
输入和输出列对。 此估算器对浮点数或双精度的标量或向量进行操作。
- replacementMode
- MissingValueReplacingEstimator.ReplacementMode
要按指定方式使用的替换类型 MissingValueReplacingEstimator.ReplacementMode
- imputeBySlot
- Boolean
如果 true
执行每个槽的替换插补。
否则,将对整个向量列进行替换值。 对于标量和变量向量,将忽略此设置,其中插补始终针对整个列。
返回
示例
using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic
{
class ReplaceMissingValuesMultiColumn
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext();
// Get a small dataset as an IEnumerable and convert it to an IDataView.
var samples = new List<DataPoint>()
{
new DataPoint(){ Features1 = new float[3] {1, 1, 0}, Features2 =
new float[2] {1, 1} },
new DataPoint(){ Features1 = new float[3] {0, float.NaN, 1},
Features2 = new float[2] {0, 1} },
new DataPoint(){ Features1 = new float[3] {-1, float.NaN, -3},
Features2 = new float[2] {-1, float.NaN} },
new DataPoint(){ Features1 = new float[3] {-1, 6, -3}, Features2 =
new float[2] {0, float.PositiveInfinity} },
};
var data = mlContext.Data.LoadFromEnumerable(samples);
// Here we use the default replacement mode, which replaces the value
// with the default value for its type.
var defaultPipeline = mlContext.Transforms.ReplaceMissingValues(new[] {
new InputOutputColumnPair("MissingReplaced1", "Features1"),
new InputOutputColumnPair("MissingReplaced2", "Features2")
},
MissingValueReplacingEstimator.ReplacementMode.DefaultValue);
// Now we can transform the data and look at the output to confirm the
// behavior of the estimator. This operation doesn't actually evaluate
// data until we read the data below.
var defaultTransformer = defaultPipeline.Fit(data);
var defaultTransformedData = defaultTransformer.Transform(data);
// We can extract the newly created column as an IEnumerable of
// SampleDataTransformed, the class we define below.
var defaultRowEnumerable = mlContext.Data.CreateEnumerable<
SampleDataTransformed>(defaultTransformedData, reuseRowObject:
false);
// And finally, we can write out the rows of the dataset, looking at the
// columns of interest.
foreach (var row in defaultRowEnumerable)
Console.WriteLine("Features1: [" + string.Join(", ", row
.Features1) + "]\t MissingReplaced1: [" + string.Join(", ", row
.MissingReplaced1) + "]\t Features2: [" + string.Join(", ", row
.Features2) + "]\t MissingReplaced2: [" + string.Join(", ", row
.MissingReplaced2) + "]");
// Expected output:
// Features1: [1, 1, 0] MissingReplaced1: [1, 1, 0] Features2: [1, 1] MissingReplaced2: [1, 1]
// Features1: [0, NaN, 1] MissingReplaced1: [0, 0, 1] Features2: [0, 1] MissingReplaced2: [0, 1]
// Features1: [-1, NaN, -3] MissingReplaced1: [-1, 0, -3] Features2: [-1, NaN] MissingReplaced2: [-1, 0]
// Features1: [-1, 6, -3] MissingReplaced1: [-1, 6, -3] Features2: [0, ∞] MissingReplaced2: [0, ∞]
// Here we use the mean replacement mode, which replaces the value with
// the mean of the non values that were not missing.
var meanPipeline = mlContext.Transforms.ReplaceMissingValues(new[] {
new InputOutputColumnPair("MissingReplaced1", "Features1"),
new InputOutputColumnPair("MissingReplaced2", "Features2")
},
MissingValueReplacingEstimator.ReplacementMode.Mean);
// Now we can transform the data and look at the output to confirm the
// behavior of the estimator.
// This operation doesn't actually evaluate data until we read the data
// below.
var meanTransformer = meanPipeline.Fit(data);
var meanTransformedData = meanTransformer.Transform(data);
// We can extract the newly created column as an IEnumerable of
// SampleDataTransformed, the class we define below.
var meanRowEnumerable = mlContext.Data.CreateEnumerable<
SampleDataTransformed>(meanTransformedData, reuseRowObject: false);
// And finally, we can write out the rows of the dataset, looking at the
// columns of interest.
foreach (var row in meanRowEnumerable)
Console.WriteLine("Features1: [" + string.Join(", ", row
.Features1) + "]\t MissingReplaced1: [" + string.Join(", ", row
.MissingReplaced1) + "]\t Features2: [" + string.Join(", ", row
.Features2) + "]\t MissingReplaced2: [" + string.Join(", ", row
.MissingReplaced2) + "]");
// Expected output:
// Features1: [1, 1, 0] MissingReplaced1: [1, 1, 0] Features2: [1, 1] MissingReplaced2: [1, 1]
// Features1: [0, NaN, 1] MissingReplaced1: [0, 3.5, 1] Features2: [0, 1] MissingReplaced2: [0, 1]
// Features1: [-1, NaN, -3] MissingReplaced1: [-1, 3.5, -3] Features2: [-1, NaN] MissingReplaced2: [-1, 1]
// Features1: [-1, 6, -3] MissingReplaced1: [-1, 6, -3] Features2: [0, ∞] MissingReplaced2: [0, ∞]
}
private class DataPoint
{
[VectorType(3)]
public float[] Features1 { get; set; }
[VectorType(2)]
public float[] Features2 { get; set; }
}
private sealed class SampleDataTransformed : DataPoint
{
[VectorType(3)]
public float[] MissingReplaced1 { get; set; }
[VectorType(2)]
public float[] MissingReplaced2 { get; set; }
}
}
}
注解
此转换可以对多个列进行操作。
适用于
ReplaceMissingValues(TransformsCatalog, String, String, MissingValueReplacingEstimator+ReplacementMode, Boolean)
创建一个 MissingValueReplacingEstimator,它将数据从指定的 inputColumnName
列复制到新列中: outputColumnName
并替换其中缺失的值,并按该 replacementMode
列替换该列中的缺失值。
public static Microsoft.ML.Transforms.MissingValueReplacingEstimator ReplaceMissingValues (this Microsoft.ML.TransformsCatalog catalog, string outputColumnName, string inputColumnName = default, Microsoft.ML.Transforms.MissingValueReplacingEstimator.ReplacementMode replacementMode = Microsoft.ML.Transforms.MissingValueReplacingEstimator+ReplacementMode.DefaultValue, bool imputeBySlot = true);
static member ReplaceMissingValues : Microsoft.ML.TransformsCatalog * string * string * Microsoft.ML.Transforms.MissingValueReplacingEstimator.ReplacementMode * bool -> Microsoft.ML.Transforms.MissingValueReplacingEstimator
<Extension()>
Public Function ReplaceMissingValues (catalog As TransformsCatalog, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional replacementMode As MissingValueReplacingEstimator.ReplacementMode = Microsoft.ML.Transforms.MissingValueReplacingEstimator+ReplacementMode.DefaultValue, Optional imputeBySlot As Boolean = true) As MissingValueReplacingEstimator
参数
- catalog
- TransformsCatalog
转换的目录。
- outputColumnName
- String
由转换 inputColumnName
生成的列的名称。
此列的数据类型将与输入列的数据类型相同。
- replacementMode
- MissingValueReplacingEstimator.ReplacementMode
要按指定方式使用的替换类型 MissingValueReplacingEstimator.ReplacementMode
- imputeBySlot
- Boolean
如果为 true,则执行替换的每个槽插补。 否则,将对整个向量列进行替换值。 对于标量和变量向量,将忽略此设置,其中插补始终针对整个列。
返回
示例
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic
{
class ReplaceMissingValues
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext();
// Get a small dataset as an IEnumerable and convert it to an IDataView.
var samples = new List<DataPoint>()
{
new DataPoint(){ Features = new float[3] {float.PositiveInfinity, 1,
0 } },
new DataPoint(){ Features = new float[3] {0, float.NaN, 1} },
new DataPoint(){ Features = new float[3] {-1, 2, -3} },
new DataPoint(){ Features = new float[3] {-1, float.NaN, -3} },
};
var data = mlContext.Data.LoadFromEnumerable(samples);
// Here we use the default replacement mode, which replaces the value
// with the default value for its type.
var defaultPipeline = mlContext.Transforms.ReplaceMissingValues(
"MissingReplaced", "Features", MissingValueReplacingEstimator
.ReplacementMode.DefaultValue);
// Now we can transform the data and look at the output to confirm the
// behavior of the estimator. This operation doesn't actually evaluate
// data until we read the data below.
var defaultTransformer = defaultPipeline.Fit(data);
var defaultTransformedData = defaultTransformer.Transform(data);
// We can extract the newly created column as an IEnumerable of
// SampleDataTransformed, the class we define below.
var defaultRowEnumerable = mlContext.Data.CreateEnumerable<
SampleDataTransformed>(defaultTransformedData, reuseRowObject:
false);
// And finally, we can write out the rows of the dataset, looking at the
// columns of interest.
foreach (var row in defaultRowEnumerable)
Console.WriteLine("Features: [" + string.Join(", ", row.Features) +
"]\t MissingReplaced: [" + string.Join(", ", row
.MissingReplaced) + "]");
// Expected output:
// Features: [∞, 1, 0] MissingReplaced: [∞, 1, 0]
// Features: [0, NaN, 1] MissingReplaced: [0, 0, 1]
// Features: [-1, 2, -3] MissingReplaced: [-1, 2, -3]
// Features: [-1, NaN, -3] MissingReplaced: [-1, 0, -3]
// Here we use the mean replacement mode, which replaces the value with
// the mean of the non values that were not missing.
var meanPipeline = mlContext.Transforms.ReplaceMissingValues(
"MissingReplaced", "Features", MissingValueReplacingEstimator
.ReplacementMode.Mean);
// Now we can transform the data and look at the output to confirm the
// behavior of the estimator. This operation doesn't actually evaluate
// data until we read the data below.
var meanTransformer = meanPipeline.Fit(data);
var meanTransformedData = meanTransformer.Transform(data);
// We can extract the newly created column as an IEnumerable of
// SampleDataTransformed, the class we define below.
var meanRowEnumerable = mlContext.Data.CreateEnumerable<
SampleDataTransformed>(meanTransformedData, reuseRowObject: false);
// And finally, we can write out the rows of the dataset, looking at the
// columns of interest.
foreach (var row in meanRowEnumerable)
Console.WriteLine("Features: [" + string.Join(", ", row.Features) +
"]\t MissingReplaced: [" + string.Join(", ", row
.MissingReplaced) + "]");
// Expected output:
// Features: [∞, 1, 0] MissingReplaced: [∞, 1, 0]
// Features: [0, NaN, 1] MissingReplaced: [0, 1.5, 1]
// Features: [-1, 2, -3] MissingReplaced: [-1, 2, -3]
// Features: [-1, NaN, -3] MissingReplaced: [-1, 1.5, -3]
}
private class DataPoint
{
[VectorType(3)]
public float[] Features { get; set; }
}
private sealed class SampleDataTransformed : DataPoint
{
[VectorType(3)]
public float[] MissingReplaced { get; set; }
}
}
}