CategoricalCatalog.OneHotHashEncoding 方法
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
重载
OneHotHashEncoding(TransformsCatalog+CategoricalTransforms, InputOutputColumnPair[], OneHotEncodingEstimator+OutputKind, Int32, UInt32, Boolean, Int32) |
创建一个 OneHotHashEncodingEstimator,它将指定的 |
OneHotHashEncoding(TransformsCatalog+CategoricalTransforms, String, String, OneHotEncodingEstimator+OutputKind, Int32, UInt32, Boolean, Int32) |
Create a OneHotHashEncodingEstimator, which converts a text column specified by |
OneHotHashEncoding(TransformsCatalog+CategoricalTransforms, InputOutputColumnPair[], OneHotEncodingEstimator+OutputKind, Int32, UInt32, Boolean, Int32)
创建一个 OneHotHashEncodingEstimator,它将指定的 columns
一个或多个输入文本列转换为基于哈希的单热编码矢量的多个列。
public static Microsoft.ML.Transforms.OneHotHashEncodingEstimator OneHotHashEncoding (this Microsoft.ML.TransformsCatalog.CategoricalTransforms catalog, Microsoft.ML.InputOutputColumnPair[] columns, Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind outputKind = Microsoft.ML.Transforms.OneHotEncodingEstimator+OutputKind.Indicator, int numberOfBits = 16, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member OneHotHashEncoding : Microsoft.ML.TransformsCatalog.CategoricalTransforms * Microsoft.ML.InputOutputColumnPair[] * Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind * int * uint32 * bool * int -> Microsoft.ML.Transforms.OneHotHashEncodingEstimator
<Extension()>
Public Function OneHotHashEncoding (catalog As TransformsCatalog.CategoricalTransforms, columns As InputOutputColumnPair(), Optional outputKind As OneHotEncodingEstimator.OutputKind = Microsoft.ML.Transforms.OneHotEncodingEstimator+OutputKind.Indicator, Optional numberOfBits As Integer = 16, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As OneHotHashEncodingEstimator
参数
转换目录
- columns
- InputOutputColumnPair[]
输入和输出列对。 输出列的数据类型将是 if outputKind
的向量Single,Indicator并且BinaryBag。
Key如果是outputKind
,则输出列的数据类型将是标量输入列的键,或者在矢量输入列的情况下为键的向量。
- outputKind
- OneHotEncodingEstimator.OutputKind
转换模式。
- numberOfBits
- Int32
要哈希到的位数。 必须介于 1 和 30 之间(含限值)。
- seed
- UInt32
哈希种子。
- useOrderedHashing
- Boolean
是否应将每个术语的位置包含在哈希中。
- maximumNumberOfInverts
- Int32
在哈希处理期间,我们在原始值和生成的哈希值之间进行断定映射。
原始值的文本表示形式存储在新列的元数据的槽名称中。 因此,哈希可以将许多初始值映射到一个值。
maximumNumberOfInverts
指定映射到应保留的哈希的非重复输入值数的上限。
0 不保留任何输入值。 -1 保留映射到每个哈希的所有输入值。
返回
示例
using System;
using Microsoft.ML;
namespace Samples.Dynamic.Transforms.Categorical
{
public static class OneHotHashEncodingMultiColumn
{
public static void Example()
{
// Create a new ML context for ML.NET operations. It can be used for
// exception tracking and logging as well as the source of randomness.
var mlContext = new MLContext();
// Get a small dataset as an IEnumerable.
var samples = new[]
{
new DataPoint {Education = "0-5yrs", ZipCode = "98005"},
new DataPoint {Education = "0-5yrs", ZipCode = "98052"},
new DataPoint {Education = "6-11yrs", ZipCode = "98005"},
new DataPoint {Education = "6-11yrs", ZipCode = "98052"},
new DataPoint {Education = "11-15yrs", ZipCode = "98005"}
};
// Convert training data to IDataView.
IDataView data = mlContext.Data.LoadFromEnumerable(samples);
// Multi column example: A pipeline for one hot has encoding two
// columns 'Education' and 'ZipCode'.
var multiColumnKeyPipeline =
mlContext.Transforms.Categorical.OneHotHashEncoding(
new[]
{
new InputOutputColumnPair("Education"),
new InputOutputColumnPair("ZipCode")
},
numberOfBits: 3);
// Fit and Transform the data.
IDataView transformedData =
multiColumnKeyPipeline.Fit(data).Transform(data);
var convertedData =
mlContext.Data.CreateEnumerable<TransformedData>(transformedData,
true);
Console.WriteLine(
"One Hot Hash Encoding of two columns 'Education' and 'ZipCode'.");
// One Hot Hash Encoding of two columns 'Education' and 'ZipCode'.
foreach (TransformedData item in convertedData)
Console.WriteLine("{0}\t\t\t{1}", string.Join(" ", item.Education),
string.Join(" ", item.ZipCode));
// We have 8 slots, because we used numberOfBits = 3.
// 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
// 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
// 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
// 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
// 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
}
private class DataPoint
{
public string Education { get; set; }
public string ZipCode { get; set; }
}
private class TransformedData
{
public float[] Education { get; set; }
public float[] ZipCode { get; set; }
}
}
}
注解
如果将多个列传递给估算器,则所有列都将在一次传递数据中进行处理。 因此,使用多个列指定一个估算器比指定一个包含单个列的估算器更有效。
适用于
OneHotHashEncoding(TransformsCatalog+CategoricalTransforms, String, String, OneHotEncodingEstimator+OutputKind, Int32, UInt32, Boolean, Int32)
Create a OneHotHashEncodingEstimator, which converts a text column specified by inputColumnName
into a hash-based one-hot encoded vector column named outputColumnName
.
public static Microsoft.ML.Transforms.OneHotHashEncodingEstimator OneHotHashEncoding (this Microsoft.ML.TransformsCatalog.CategoricalTransforms catalog, string outputColumnName, string inputColumnName = default, Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind outputKind = Microsoft.ML.Transforms.OneHotEncodingEstimator+OutputKind.Indicator, int numberOfBits = 16, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member OneHotHashEncoding : Microsoft.ML.TransformsCatalog.CategoricalTransforms * string * string * Microsoft.ML.Transforms.OneHotEncodingEstimator.OutputKind * int * uint32 * bool * int -> Microsoft.ML.Transforms.OneHotHashEncodingEstimator
<Extension()>
Public Function OneHotHashEncoding (catalog As TransformsCatalog.CategoricalTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional outputKind As OneHotEncodingEstimator.OutputKind = Microsoft.ML.Transforms.OneHotEncodingEstimator+OutputKind.Indicator, Optional numberOfBits As Integer = 16, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As OneHotHashEncodingEstimator
参数
转换目录。
- outputColumnName
- String
由转换 inputColumnName
生成的列的名称。
此列的数据类型将是 if IndicatoroutputKind
Bag的向量Single,并且。Binary
Key如果是outputKind
,则此列的数据类型将是标量输入列的键,或者当矢量输入列时为键的向量。
- inputColumnName
- String
要转换的列的名称。 If set to null
, the value of the outputColumnName
will be used as source.
此列的数据类型可以是数值、文本、布尔 DateTime 值或 DateTimeOffset数值的标量或向量。
- outputKind
- OneHotEncodingEstimator.OutputKind
转换模式。
- numberOfBits
- Int32
要哈希到的位数。 必须介于 1 和 30 之间(含限值)。
- seed
- UInt32
哈希种子。
- useOrderedHashing
- Boolean
是否应将每个术语的位置包含在哈希中。
- maximumNumberOfInverts
- Int32
在哈希处理期间,我们在原始值和生成的哈希值之间进行断定映射。
原始值的文本表示形式存储在新列的元数据的槽名称中。因此,哈希可以将许多初始值映射到一个值。
maximumNumberOfInverts
指定映射到应保留的哈希的非重复输入值数的上限。
0 不保留任何输入值。 -1 保留映射到每个哈希的所有输入值。
返回
示例
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic.Transforms.Categorical
{
public static class OneHotHashEncoding
{
public static void Example()
{
// Create a new ML context for ML.NET operations. It can be used for
// exception tracking and logging as well as the source of randomness.
var mlContext = new MLContext();
// Create a small dataset as an IEnumerable.
var samples = new[]
{
new DataPoint {Education = "0-5yrs"},
new DataPoint {Education = "0-5yrs"},
new DataPoint {Education = "6-11yrs"},
new DataPoint {Education = "6-11yrs"},
new DataPoint {Education = "11-15yrs"}
};
// Convert training data to an IDataView.
IDataView data = mlContext.Data.LoadFromEnumerable(samples);
// A pipeline for one hot hash encoding the 'Education' column.
var pipeline = mlContext.Transforms.Categorical.OneHotHashEncoding(
"EducationOneHotHashEncoded", "Education", numberOfBits: 3);
// Fit and transform the data.
IDataView hashEncodedData = pipeline.Fit(data).Transform(data);
PrintDataColumn(hashEncodedData, "EducationOneHotHashEncoded");
// We have 8 slots, because we used numberOfBits = 3.
// 0 0 0 1 0 0 0 0
// 0 0 0 1 0 0 0 0
// 0 0 0 0 1 0 0 0
// 0 0 0 0 1 0 0 0
// 0 0 0 0 0 0 0 1
// A pipeline for one hot hash encoding the 'Education' column
// (using keying strategy).
var keyPipeline = mlContext.Transforms.Categorical.OneHotHashEncoding(
"EducationOneHotHashEncoded", "Education",
OneHotEncodingEstimator.OutputKind.Key, 3);
// Fit and transform the data.
IDataView hashKeyEncodedData = keyPipeline.Fit(data).Transform(data);
// Get the data of the newly created column for inspecting.
var keyEncodedColumn =
hashKeyEncodedData.GetColumn<uint>("EducationOneHotHashEncoded");
Console.WriteLine(
"One Hot Hash Encoding of single column 'Education', with key " +
"type output.");
// One Hot Hash Encoding of single column 'Education', with key type output.
foreach (uint element in keyEncodedColumn)
Console.WriteLine(element);
// 4
// 4
// 5
// 5
// 8
}
private static void PrintDataColumn(IDataView transformedData,
string columnName)
{
var countSelectColumn = transformedData.GetColumn<float[]>(
transformedData.Schema[columnName]);
foreach (var row in countSelectColumn)
{
for (var i = 0; i < row.Length; i++)
Console.Write($"{row[i]}\t");
Console.WriteLine();
}
}
private class DataPoint
{
public string Education { get; set; }
}
}
}