ConversionsExtensionsCatalog.Hash Metode
Definisi
Penting
Beberapa informasi terkait produk prarilis yang dapat diubah secara signifikan sebelum dirilis. Microsoft tidak memberikan jaminan, tersirat maupun tersurat, sehubungan dengan informasi yang diberikan di sini.
Overload
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[]) |
Buat HashingEstimator, yang hash jenis InputColumnName data kolom input ke kolom baru: Name. |
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32) |
Buat HashingEstimator, yang hash data dari kolom yang ditentukan ke |
Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])
Buat HashingEstimator, yang hash jenis InputColumnName data kolom input ke kolom baru: Name.
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator
Parameter
Katalog transformasi.
- columns
- HashingEstimator.ColumnOptions[]
Opsi tingkat lanjut untuk estimator yang juga berisi nama kolom input dan output. Estimator ini beroperasi melalui jenis teks, numerik, boolean, kunci, dan DataViewRowId data. Jenis data kolom baru akan menjadi vektor UInt32, atau UInt32 berdasarkan apakah jenis data kolom input adalah vektor atau skalar.
Mengembalikan
Contoh
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types by using Hash transform's
// advanced options API.
public static class HashWithOptions
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash(
new[]
{
new HashingEstimator.ColumnOptions(
"CategoryHashed",
"Category",
16,
useOrderedHashing: false,
maximumNumberOfInverts: -1),
new HashingEstimator.ColumnOptions(
"AgeHashed",
"Age",
8,
useOrderedHashing: false)
});
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}
Keterangan
Transformasi ini dapat beroperasi di beberapa kolom.
Berlaku untuk
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)
Buat HashingEstimator, yang hash data dari kolom yang ditentukan ke inputColumnName
kolom baru: outputColumnName
.
public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator
Parameter
Katalog transformasi konversi.
- outputColumnName
- String
Nama kolom yang dihasilkan dari transformasi inputColumnName
.
Jenis data kolom ini akan menjadi vektor kunci, atau skalar kunci berdasarkan apakah jenis data kolom input adalah vektor atau skalar.
- inputColumnName
- String
Nama kolom yang datanya akan di-hash.
Jika diatur ke null
, nilai outputColumnName
akan digunakan sebagai sumber.
Estimator ini beroperasi di atas vektor atau skalar teks, numerik, boolean, kunci atau DataViewRowId jenis data.
- numberOfBits
- Int32
Jumlah bit yang akan di-hash. Harus antara 1 dan 31, inklusif.
- maximumNumberOfInverts
- Int32
Selama hashing, kami membangun pemetaan antara nilai asli dan nilai hash yang dihasilkan.
Representasi teks nilai asli disimpan dalam nama slot anotasi untuk kolom baru. Hashing, dengan demikian, dapat memetakan banyak nilai awal menjadi satu.
maximumNumberOfInverts
Menentukan batas atas jumlah pemetaan nilai input yang berbeda ke hash yang harus dipertahankan.
0 tidak mempertahankan nilai input apa pun. -1 mempertahankan semua pemetaan nilai input ke setiap hash.
Mengembalikan
Contoh
using System;
using Microsoft.ML;
using Microsoft.ML.Data;
namespace Samples.Dynamic
{
// This example demonstrates hashing of categorical string and integer data types.
public static class Hash
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext(seed: 1);
// Get a small dataset as an IEnumerable.
var rawData = new[] {
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "NFL" , Age = 14 },
new DataPoint() { Category = "NFL" , Age = 15 },
new DataPoint() { Category = "MLB" , Age = 18 },
new DataPoint() { Category = "MLS" , Age = 14 },
};
var data = mlContext.Data.LoadFromEnumerable(rawData);
// Construct the pipeline that would hash the two columns and store the
// results in new columns. The first transform hashes the string column
// and the second transform hashes the integer column.
//
// Hashing is not a reversible operation, so there is no way to retrieve
// the original value from the hashed value. Sometimes, for debugging,
// or model explainability, users will need to know what values in the
// original columns generated the values in the hashed columns, since
// the algorithms will mostly use the hashed values for further
// computations. The Hash method will preserve the mapping from the
// original values to the hashed values in the Annotations of the newly
// created column (column populated with the hashed values).
//
// Setting the maximumNumberOfInverts parameters to -1 will preserve the
// full map. If that parameter is left to the default 0 value, the
// mapping is not preserved.
var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
"Category", numberOfBits: 16, maximumNumberOfInverts: -1)
.Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
numberOfBits: 8));
// Let's fit our pipeline, and then apply it to the same data.
var transformer = pipeline.Fit(data);
var transformedData = transformer.Transform(data);
// Convert the post transformation from the IDataView format to an
// IEnumerable <TransformedData> for easy consumption.
var convertedData = mlContext.Data.CreateEnumerable<
TransformedDataPoint>(transformedData, true);
Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
foreach (var item in convertedData)
Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t " +
$"{item.Age}\t {item.AgeHashed}");
// Expected data after the transformation.
//
// Category CategoryHashed Age AgeHashed
// MLB 36206 18 127
// NFL 19015 14 62
// NFL 19015 15 43
// MLB 36206 18 127
// MLS 6013 14 62
// For the Category column, where we set the maximumNumberOfInverts
// parameter, the names of the original categories, and their
// correspondence with the generated hash values is preserved in the
// Annotations in the format of indices and values.the indices array
// will have the hashed values, and the corresponding element,
// position -wise, in the values array will contain the original value.
//
// See below for an example on how to retrieve the mapping.
var slotNames = new VBuffer<ReadOnlyMemory<char>>();
transformedData.Schema["CategoryHashed"].Annotations.GetValue(
"KeyValues", ref slotNames);
var indices = slotNames.GetIndices();
var categoryNames = slotNames.GetValues();
for (int i = 0; i < indices.Length; i++)
Console.WriteLine($"The original value of the {indices[i]} " +
$"category is {categoryNames[i]}");
// Output Data
//
// The original value of the 6012 category is MLS
// The original value of the 19014 category is NFL
// The original value of the 36205 category is MLB
}
public class DataPoint
{
public string Category { get; set; }
public uint Age { get; set; }
}
public class TransformedDataPoint : DataPoint
{
public uint CategoryHashed { get; set; }
public uint AgeHashed { get; set; }
}
}
}