ConversionsExtensionsCatalog.Hash Metoda

Odwołanie

Definicja

Przestrzeń nazw:: Microsoft.ML

Zestaw:: Microsoft.ML.Data.dll

Pakiet:: Microsoft.ML v4.0.1

Pakiet:: Microsoft.ML v1.5.5

Pakiet:: Microsoft.ML v1.6.0

Pakiet:: Microsoft.ML v1.7.0

Pakiet:: Microsoft.ML v2.0.1

Pakiet:: Microsoft.ML v3.0.1

Pakiet:: Microsoft.ML v5.0.0-preview.1.25125.4

Pakiet:: Microsoft.ML v1.0.0

Pakiet:: Microsoft.ML v1.1.0

Pakiet:: Microsoft.ML v1.2.0

Pakiet:: Microsoft.ML v1.3.1

Pakiet:: Microsoft.ML v1.4.0

Ważne

Niektóre informacje odnoszą się do produktu w wersji wstępnej, który może zostać znacząco zmodyfikowany przed wydaniem. Firma Microsoft nie udziela żadnych gwarancji, jawnych lub domniemanych, w odniesieniu do informacji podanych w tym miejscu.

Przeciążenia

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])	Utwórz element HashingEstimator, który powoduje skrót typu InputColumnName danych kolumny wejściowej do nowej kolumny: Name.
Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)	Utwórz element HashingEstimator, który powoduje skrót danych z kolumny określonej w `inputColumnName` elemecie do nowej kolumny: `outputColumnName`.

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

Źródło:: ConversionsExtensionsCatalog.cs

Źródło:: ConversionsExtensionsCatalog.cs

Źródło:: ConversionsExtensionsCatalog.cs

Utwórz element HashingEstimator, który powoduje skrót typu InputColumnName danych kolumny wejściowej do nowej kolumny: Name.

public static Microsoft.ML.Transforms.HashingEstimator Hash(this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);

static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator

<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator

Parametry

catalog: TransformsCatalog.ConversionTransforms

Wykaz przekształcenia.

columns: HashingEstimator.ColumnOptions[]

Zaawansowane opcje narzędzia do szacowania, które zawierają również nazwy kolumn wejściowych i wyjściowych. Ten narzędzie do szacowania działa na typach tekstu, liczbowych, logicznych, kluczowych i DataViewRowId danych. Typ danych nowej kolumny będzie wektorem UInt32wartości lub UInt32 na podstawie tego, czy typy danych kolumn wejściowych są wektorami, czy skalarami.

Zwraca

HashingEstimator

Przykłady

using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types by using Hash transform's 
    // advanced options API.
    public static class HashWithOptions
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash(
                    new[]
                    {
                            new HashingEstimator.ColumnOptions(
                                "CategoryHashed",
                                "Category",
                                16,
                                useOrderedHashing: false,
                                maximumNumberOfInverts: -1),

                            new HashingEstimator.ColumnOptions(
                                "AgeHashed",
                                "Age",
                                8,
                                useOrderedHashing: false)
                    });

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

Uwagi

Ta transformacja może działać w kilku kolumnach.

Dotyczy

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

Źródło:: ConversionsExtensionsCatalog.cs

Źródło:: ConversionsExtensionsCatalog.cs

Źródło:: ConversionsExtensionsCatalog.cs

Utwórz element HashingEstimator, który powoduje skrót danych z kolumny określonej w inputColumnName elemecie do nowej kolumny: outputColumnName.

public static Microsoft.ML.Transforms.HashingEstimator Hash(this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);

static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator

<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator

Parametry

catalog: TransformsCatalog.ConversionTransforms

Wykaz przekształcenia konwersji.

outputColumnName: String

Nazwa kolumny wynikającej z przekształcenia elementu inputColumnName. Typ danych tej kolumny będzie wektorem kluczy lub skalarnym kluczem na podstawie tego, czy typy danych kolumn wejściowych są wektorami, czy skalarami.

inputColumnName: String

Nazwa kolumny, której dane zostaną skrócone. W przypadku ustawienia wartości nullwartość parametru outputColumnName będzie używana jako źródło. Ten narzędzie do szacowania działa na wektorach lub skalarnych typach tekstu, liczbowych, logicznych, kluczowych lub DataViewRowId danych.

numberOfBits: Int32

Liczba bitów do skrótu. Musi zawierać się w zakresie od 1 do 31 włącznie.

maximumNumberOfInverts: Int32

Podczas tworzenia skrótów tworzymy mapowania między oryginalnymi wartościami a wygenerowanymi wartościami skrótu. Tekstowa reprezentacja oryginalnych wartości jest przechowywana w nazwach miejsc adnotacji dla nowej kolumny. Skróty, w związku z tym, mogą mapować wiele wartości początkowych na jedną. maximumNumberOfInvertsOkreśla górną granicę liczby unikatowych wartości wejściowych mapowania na skrót, który ma zostać zachowany. Wartość 0 nie zachowuje żadnych wartości wejściowych. -1 zachowuje wszystkie wartości wejściowe mapowania na każdy skrót.

Zwraca

HashingEstimator

Przykłady

using System;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types.
    public static class Hash
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
                "Category", numberOfBits: 16, maximumNumberOfInverts: -1)
                .Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
                numberOfBits: 8));

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

Dotyczy

Udostępnij za pośrednictwem

ConversionsExtensionsCatalog.Hash Metoda

Definicja

Przeciążenia

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

Parametry

Zwraca

Przykłady

Uwagi

Dotyczy

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

Parametry

Zwraca

Przykłady

Dotyczy