Compartilhar via


ConversionsExtensionsCatalog.Hash Método

Definição

Sobrecargas

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

Crie um HashingEstimator, que hashes o tipo InputColumnName de dados da coluna de entrada para uma nova coluna: Name.

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

Crie um HashingEstimator, que hashes os dados da coluna especificada em inputColumnName uma nova coluna: outputColumnName.

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

Crie um HashingEstimator, que hashes o tipo InputColumnName de dados da coluna de entrada para uma nova coluna: Name.

public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator

Parâmetros

catalog
TransformsCatalog.ConversionTransforms

O catálogo da transformação.

columns
HashingEstimator.ColumnOptions[]

Opções avançadas para o estimador que também contêm os nomes de coluna de entrada e saída. Esse estimador opera nos tipos de texto, numérico, booliano, chave e DataViewRowId dados. O tipo de dados da nova coluna será um vetor ou baseado UInt32 em se os tipos de UInt32dados de coluna de entrada são vetores ou escalares.

Retornos

Exemplos

using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types by using Hash transform's 
    // advanced options API.
    public static class HashWithOptions
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash(
                    new[]
                    {
                            new HashingEstimator.ColumnOptions(
                                "CategoryHashed",
                                "Category",
                                16,
                                useOrderedHashing: false,
                                maximumNumberOfInverts: -1),

                            new HashingEstimator.ColumnOptions(
                                "AgeHashed",
                                "Age",
                                8,
                                useOrderedHashing: false)
                    });

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

Comentários

Essa transformação pode operar em várias colunas.

Aplica-se a

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

Crie um HashingEstimator, que hashes os dados da coluna especificada em inputColumnName uma nova coluna: outputColumnName.

public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator

Parâmetros

catalog
TransformsCatalog.ConversionTransforms

O catálogo da transformação de conversão.

outputColumnName
String

Nome da coluna resultante da transformação de inputColumnName. O tipo de dados dessa coluna será um vetor de chaves ou um escalar de chave com base em se os tipos de dados de coluna de entrada são vetores ou escalares.

inputColumnName
String

Nome da coluna cujos dados serão hash. Se definido como null, o valor do outputColumnName será usado como origem. Esse estimador opera em vetores ou escalares de texto, numérico, booliano, chave ou DataViewRowId tipos de dados.

numberOfBits
Int32

Número de bits para usar com o hash. Deve estar entre 1 e 31, inclusive.

maximumNumberOfInverts
Int32

Durante o hash, construímos mapeamentos entre valores originais e os valores de hash produzidos. A representação de texto de valores originais é armazenada nos nomes de slot das anotações da nova coluna. O hash, como tal, pode mapear muitos valores iniciais para um. maximumNumberOfInvertsEspecifica o limite superior do número de valores de entrada distintos mapeados para um hash que deve ser retido. 0 não retém nenhum valor de entrada. -1 mantém todos os valores de entrada mapeando para cada hash.

Retornos

Exemplos

using System;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types.
    public static class Hash
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
                "Category", numberOfBits: 16, maximumNumberOfInverts: -1)
                .Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
                numberOfBits: 8));

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

Aplica-se a