TextCatalog.LatentDirichletAllocation Metoda

Reference

Definice

Obor názvů:: Microsoft.ML

Sestavení:: Microsoft.ML.Transforms.dll

Balíček:: Microsoft.ML v3.0.1

Balíček:: Microsoft.ML v1.0.0

Balíček:: Microsoft.ML v1.1.0

Balíček:: Microsoft.ML v1.2.0

Balíček:: Microsoft.ML v1.3.1

Balíček:: Microsoft.ML v1.4.0

Balíček:: Microsoft.ML v1.5.5

Balíček:: Microsoft.ML v1.6.0

Balíček:: Microsoft.ML v1.7.0

Balíček:: Microsoft.ML v2.0.0

Důležité

Některé informace platí pro předběžně vydaný produkt, který se může zásadně změnit, než ho výrobce nebo autor vydá. Microsoft neposkytuje žádné záruky, výslovné ani předpokládané, týkající se zde uváděných informací.

LatentDirichletAllocationEstimatorVytvořte vektor, který pomocí LightLDA transformuje text (reprezentovaný jako vektor plovoucích hodnot) na vektor označující Single podobnost textu s identifikovaným tématem.

public static Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator LatentDirichletAllocation (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfTopics = 100, float alphaSum = 100, float beta = 0.01, int samplingStepCount = 4, int maximumNumberOfIterations = 200, int likelihoodInterval = 5, int numberOfThreads = 0, int maximumTokenCountPerDocument = 512, int numberOfSummaryTermsPerTopic = 10, int numberOfBurninIterations = 10, bool resetRandomGenerator = false);

static member LatentDirichletAllocation : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * single * single * int * int * int * int * int * int * int * bool -> Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator

<Extension()>
Public Function LatentDirichletAllocation (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfTopics As Integer = 100, Optional alphaSum As Single = 100, Optional beta As Single = 0.01, Optional samplingStepCount As Integer = 4, Optional maximumNumberOfIterations As Integer = 200, Optional likelihoodInterval As Integer = 5, Optional numberOfThreads As Integer = 0, Optional maximumTokenCountPerDocument As Integer = 512, Optional numberOfSummaryTermsPerTopic As Integer = 10, Optional numberOfBurninIterations As Integer = 10, Optional resetRandomGenerator As Boolean = false) As LatentDirichletAllocationEstimator

Parametry

catalog: TransformsCatalog.TextTransforms

Katalog transformace.

outputColumnName: String

Název sloupce, který je výsledkem transformace inputColumnName. Tento estimátor vypíše vektor .Single

inputColumnName: String

Název sloupce, který se má transformovat. Pokud je nastavená hodnota null, použije se jako zdroj hodnota outputColumnName . Tento estimátor pracuje s vektorem Single.

numberOfTopics: Int32

Počet témat.

alphaSum: Single

Dirichlet prior on document-topic vectors.

beta: Single

Dirichlet prior on vocab-topic vectors.

samplingStepCount: Int32

Počet kroků Metropolis Hasting.

maximumNumberOfIterations: Int32

Počet iterací.

likelihoodInterval: Int32

Pravděpodobnost výpočetního protokolu nad místní datovou sadou v tomto intervalu iterace

numberOfThreads: Int32

Počet trénovacích vláken. Výchozí hodnota závisí na počtu logických procesorů.

maximumTokenCountPerDocument: Int32

Prahová hodnota maximálního počtu tokenů na dokument

numberOfSummaryTermsPerTopic: Int32

Počet slov, která chcete shrnout téma.

numberOfBurninIterations: Int32

Počet iterací vypálení.

resetRandomGenerator: Boolean

Resetujte generátor náhodných čísel pro každý dokument.

Návraty

LatentDirichletAllocationEstimator

Příklady

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class LatentDirichletAllocation
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<TextData>()
            {
                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "computes topic models." },

                new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
                "is the best for topic models." },

                new TextData(){ Text = "I like to eat broccoli and bananas." },
                new TextData(){ Text = "I eat bananas for breakfast." },
                new TextData(){ Text = "This car is expensive compared to last " +
                "week's price." },

                new TextData(){ Text = "This car was $X last week." },
            };

            // Convert training data to IDataView.
            var dataview = mlContext.Data.LoadFromEnumerable(samples);

            // A pipeline for featurizing the text/string using 
            // LatentDirichletAllocation API. o be more accurate in computing the
            // LDA features, the pipeline first normalizes text and removes stop
            // words before passing tokens (the individual words, lower cased, with
            // common words removed) to LatentDirichletAllocation.
            var pipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText",
                "Text")
                .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens",
                    "NormalizedText"))
                .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("Tokens"))
                .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
                .Append(mlContext.Transforms.Text.ProduceNgrams("Tokens"))
                .Append(mlContext.Transforms.Text.LatentDirichletAllocation(
                    "Features", "Tokens", numberOfTopics: 3));

            // Fit to data.
            var transformer = pipeline.Fit(dataview);

            // Create the prediction engine to get the LDA features extracted from
            // the text.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(transformer);

            // Convert the sample text into LDA features and print it.
            PrintLdaFeatures(predictionEngine.Predict(samples[0]));
            PrintLdaFeatures(predictionEngine.Predict(samples[1]));

            // Features obtained post-transformation.
            // For LatentDirichletAllocation, we had specified numTopic:3. Hence
            // each prediction has been featurized as a vector of floats with length
            // 3.

            //  Topic1  Topic2  Topic3
            //  0.6364  0.2727  0.0909
            //  0.5455  0.1818  0.2727
        }

        private static void PrintLdaFeatures(TransformedTextData prediction)
        {
            for (int i = 0; i < prediction.Features.Length; i++)
                Console.Write($"{prediction.Features[i]:F4}  ");
            Console.WriteLine();
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public float[] Features { get; set; }
        }
    }
}

Platí pro

Share via