TextCatalog.LatentDirichletAllocation Método

Referencia

Definición

Espacio de nombres:: Microsoft.ML

Ensamblado:: Microsoft.ML.Transforms.dll

Paquete:: Microsoft.ML v3.0.1

Paquete:: Microsoft.ML v1.0.0

Paquete:: Microsoft.ML v1.1.0

Paquete:: Microsoft.ML v1.2.0

Paquete:: Microsoft.ML v1.3.1

Paquete:: Microsoft.ML v1.4.0

Paquete:: Microsoft.ML v1.5.5

Paquete:: Microsoft.ML v1.6.0

Paquete:: Microsoft.ML v1.7.0

Paquete:: Microsoft.ML v2.0.0

Importante

Parte de la información hace referencia a la versión preliminar del producto, que puede haberse modificado sustancialmente antes de lanzar la versión definitiva. Microsoft no otorga ninguna garantía, explícita o implícita, con respecto a la información proporcionada aquí.

Cree un LatentDirichletAllocationEstimatorobjeto , que usa LightLDA para transformar texto (representado como vector de floats) en un vector de Single que indique la similitud del texto con cada tema identificado.

public static Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator LatentDirichletAllocation (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfTopics = 100, float alphaSum = 100, float beta = 0.01, int samplingStepCount = 4, int maximumNumberOfIterations = 200, int likelihoodInterval = 5, int numberOfThreads = 0, int maximumTokenCountPerDocument = 512, int numberOfSummaryTermsPerTopic = 10, int numberOfBurninIterations = 10, bool resetRandomGenerator = false);

static member LatentDirichletAllocation : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * single * single * int * int * int * int * int * int * int * bool -> Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator

<Extension()>
Public Function LatentDirichletAllocation (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfTopics As Integer = 100, Optional alphaSum As Single = 100, Optional beta As Single = 0.01, Optional samplingStepCount As Integer = 4, Optional maximumNumberOfIterations As Integer = 200, Optional likelihoodInterval As Integer = 5, Optional numberOfThreads As Integer = 0, Optional maximumTokenCountPerDocument As Integer = 512, Optional numberOfSummaryTermsPerTopic As Integer = 10, Optional numberOfBurninIterations As Integer = 10, Optional resetRandomGenerator As Boolean = false) As LatentDirichletAllocationEstimator

Parámetros

catalog: TransformsCatalog.TextTransforms

Catálogo de la transformación.

outputColumnName: String

Nombre de la columna resultante de la transformación de inputColumnName. Este estimador genera un vector de Single.

inputColumnName: String

Nombre de la columna que se va a transformar. Si se establece en null, el valor de outputColumnName se usará como origen. Este estimador funciona sobre un vector de Single.

numberOfTopics: Int32

Número de temas.

alphaSum: Single

Dirichlet anterior en vectores de tema de documento.

beta: Single

Dirichlet anterior en vectores vocab-topic.

samplingStepCount: Int32

Número de pasos de Hasting de Metrópolis.

maximumNumberOfIterations: Int32

Número de iteraciones.

likelihoodInterval: Int32

Probabilidad del registro de proceso en el conjunto de datos local en este intervalo de iteración.

numberOfThreads: Int32

Número de subprocesos de entrenamiento. El valor predeterminado depende del número de procesadores lógicos.

maximumTokenCountPerDocument: Int32

Umbral de recuento máximo de tokens por documento.

numberOfSummaryTermsPerTopic: Int32

Número de palabras que se resumen en el tema.

numberOfBurninIterations: Int32

Número de iteraciones que se queman.

resetRandomGenerator: Boolean

Restablezca el generador de números aleatorios para cada documento.

Devoluciones

LatentDirichletAllocationEstimator

Ejemplos

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class LatentDirichletAllocation
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "computes topic models." },

                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "is the best for topic models." },

                new TextData(){ Text = "I like to eat broccoli and bananas." },
                new TextData(){ Text = "I eat bananas for breakfast." },
                new TextData(){ Text = "This car is expensive compared to last " +
                "week's price." },

                new TextData(){ Text = "This car was $X last week." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for featurizing the text/string using 
            // LatentDirichletAllocation API. o be more accurate in computing the
            // LDA features, the pipeline first normalizes text and removes stop
            // words before passing tokens (the individual words, lower cased, with
            // common words removed) to LatentDirichletAllocation.
            var pipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText",
                "Text")
                .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens",
                    "NormalizedText"))
                .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("Tokens"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
                .Append(mlContext.Transforms.Text.ProduceNgrams("Tokens"))
                .Append(mlContext.Transforms.Text.LatentDirichletAllocation(
                    "Features", "Tokens", numberOfTopics: 3));

            // Fit to data.
            var transformer = pipeline.Fit(dataview);

            // Create the prediction engine to get the LDA features extracted from
            // the text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(transformer);

            // Convert the sample text into LDA features and print it.
            PrintLdaFeatures(predictionEngine.Predict(samples[0]));
            PrintLdaFeatures(predictionEngine.Predict(samples[1]));

            // Features obtained post-transformation.
            // For LatentDirichletAllocation, we had specified numTopic:3. Hence
            // each prediction has been featurized as a vector of floats with length
            // 3.

            //  Topic1  Topic2  Topic3
            //  0.6364  0.2727  0.0909
            //  0.5455  0.1818  0.2727
        }

        private static void PrintLdaFeatures(TransformedTextData prediction)
        {
            for (int i = 0; i < prediction.Features.Length; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");
            Console.WriteLine();
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
        }
    }
}

Se aplica a