TextCatalog.LatentDirichletAllocation Metodo

Riferimento

Definizione

Spazio dei nomi:: Microsoft.ML

Assembly:: Microsoft.ML.Transforms.dll

Pacchetto:: Microsoft.ML v3.0.1

Pacchetto:: Microsoft.ML v1.0.0

Pacchetto:: Microsoft.ML v1.1.0

Pacchetto:: Microsoft.ML v1.2.0

Pacchetto:: Microsoft.ML v1.3.1

Pacchetto:: Microsoft.ML v1.4.0

Pacchetto:: Microsoft.ML v1.5.5

Pacchetto:: Microsoft.ML v1.6.0

Pacchetto:: Microsoft.ML v1.7.0

Pacchetto:: Microsoft.ML v2.0.0

Importante

Alcune informazioni sono relative alla release non definitiva del prodotto, che potrebbe subire modifiche significative prima della release definitiva. Microsoft non riconosce alcuna garanzia, espressa o implicita, in merito alle informazioni qui fornite.

Creare un LatentDirichletAllocationEstimatoroggetto , che usa LightLDA per trasformare il testo (rappresentato come vettore di float) in un vettore di Single che indica la somiglianza del testo con ogni argomento identificato.

public static Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator LatentDirichletAllocation (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfTopics = 100, float alphaSum = 100, float beta = 0.01, int samplingStepCount = 4, int maximumNumberOfIterations = 200, int likelihoodInterval = 5, int numberOfThreads = 0, int maximumTokenCountPerDocument = 512, int numberOfSummaryTermsPerTopic = 10, int numberOfBurninIterations = 10, bool resetRandomGenerator = false);

static member LatentDirichletAllocation : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * single * single * int * int * int * int * int * int * int * bool -> Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator

<Extension()>
Public Function LatentDirichletAllocation (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfTopics As Integer = 100, Optional alphaSum As Single = 100, Optional beta As Single = 0.01, Optional samplingStepCount As Integer = 4, Optional maximumNumberOfIterations As Integer = 200, Optional likelihoodInterval As Integer = 5, Optional numberOfThreads As Integer = 0, Optional maximumTokenCountPerDocument As Integer = 512, Optional numberOfSummaryTermsPerTopic As Integer = 10, Optional numberOfBurninIterations As Integer = 10, Optional resetRandomGenerator As Boolean = false) As LatentDirichletAllocationEstimator

Parametri

catalog: TransformsCatalog.TextTransforms

Catalogo della trasformazione.

outputColumnName: String

Nome della colonna risultante dalla trasformazione di inputColumnName. Questo strumento di stima restituisce un vettore di Single.

inputColumnName: String

Nome della colonna da trasformare. Se impostato su null, il valore dell'oggetto outputColumnName verrà usato come origine. Questo strumento di stima opera su un vettore di Single.

numberOfTopics: Int32

Numero di argomenti.

alphaSum: Single

Dirichlet precedente sui vettori dell'argomento di documento.

beta: Single

Dirichlet precedente su vettori vocab-topic.

samplingStepCount: Int32

Numero di passaggio di Hasting di Metropolis.

maximumNumberOfIterations: Int32

Numero di iterazioni.

likelihoodInterval: Int32

Probabilità del log di calcolo su set di dati locali in questo intervallo di iterazione.

numberOfThreads: Int32

Numero di thread di training. Il valore predefinito dipende dal numero di processori logici.

maximumTokenCountPerDocument: Int32

Soglia del numero massimo di token per documento.

numberOfSummaryTermsPerTopic: Int32

Numero di parole per riepilogare l'argomento.

numberOfBurninIterations: Int32

Numero di iterazioni burn-in.

resetRandomGenerator: Boolean

Reimpostare il generatore di numeri casuali per ogni documento.

Restituisce

LatentDirichletAllocationEstimator

Esempio

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class LatentDirichletAllocation
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "computes topic models." },

                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "is the best for topic models." },

                new TextData(){ Text = "I like to eat broccoli and bananas." },
                new TextData(){ Text = "I eat bananas for breakfast." },
                new TextData(){ Text = "This car is expensive compared to last " +
                "week's price." },

                new TextData(){ Text = "This car was $X last week." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for featurizing the text/string using 
            // LatentDirichletAllocation API. o be more accurate in computing the
            // LDA features, the pipeline first normalizes text and removes stop
            // words before passing tokens (the individual words, lower cased, with
            // common words removed) to LatentDirichletAllocation.
            var pipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText",
                "Text")
                .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens",
                    "NormalizedText"))
                .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("Tokens"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
                .Append(mlContext.Transforms.Text.ProduceNgrams("Tokens"))
                .Append(mlContext.Transforms.Text.LatentDirichletAllocation(
                    "Features", "Tokens", numberOfTopics: 3));

            // Fit to data.
            var transformer = pipeline.Fit(dataview);

            // Create the prediction engine to get the LDA features extracted from
            // the text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(transformer);

            // Convert the sample text into LDA features and print it.
            PrintLdaFeatures(predictionEngine.Predict(samples[0]));
            PrintLdaFeatures(predictionEngine.Predict(samples[1]));

            // Features obtained post-transformation.
            // For LatentDirichletAllocation, we had specified numTopic:3. Hence
            // each prediction has been featurized as a vector of floats with length
            // 3.

            //  Topic1  Topic2  Topic3
            //  0.6364  0.2727  0.0909
            //  0.5455  0.1818  0.2727
        }

        private static void PrintLdaFeatures(TransformedTextData prediction)
        {
            for (int i = 0; i < prediction.Features.Length; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");
            Console.WriteLine();
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
        }
    }
}

Si applica a

Condividi tramite