TextCatalog.TokenizeIntoWords Methode
Definition
Wichtig
Einige Informationen beziehen sich auf Vorabversionen, die vor dem Release ggf. grundlegend überarbeitet werden. Microsoft übernimmt hinsichtlich der hier bereitgestellten Informationen keine Gewährleistungen, seien sie ausdrücklich oder konkludent.
Erstellen Sie einen WordTokenizingEstimatorEingabetext, der als Trennzeichen verwendet separators
wird.
public static Microsoft.ML.Transforms.Text.WordTokenizingEstimator TokenizeIntoWords (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, char[] separators = default);
static member TokenizeIntoWords : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * char[] -> Microsoft.ML.Transforms.Text.WordTokenizingEstimator
<Extension()>
Public Function TokenizeIntoWords (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional separators As Char() = Nothing) As WordTokenizingEstimator
Parameter
- catalog
- TransformsCatalog.TextTransforms
Der Textbezogene Transformationskatalog.
- outputColumnName
- String
Name der Spalte, die aus der Transformation von inputColumnName
.
Der Datentyp dieser Spalte ist ein variabler Textvektor.
- inputColumnName
- String
Name der zu transformierenden Spalte.
null
Wenn festgelegt auf , wird der Wert des outputColumnName
Werts als Quelle verwendet.
Diese Stimator arbeitet auf der Skalar von Text und Vektor des Textdatentyps.
- separators
- Char[]
Die zu verwendenden Trennzeichen (verwendet standardmäßig Leerzeichen).
Gibt zurück
Beispiele
using System;
using System.Collections.Generic;
using Microsoft.ML;
namespace Samples.Dynamic
{
public static class TokenizeIntoWords
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext();
// Create an empty list as the dataset. The 'TokenizeIntoWords' does
// not require training data as the estimator
// ('WordTokenizingEstimator') created by 'TokenizeIntoWords' API is not
// a trainable estimator. The empty list is only needed to pass input
// schema to the pipeline.
var emptySamples = new List<TextData>();
// Convert sample list to an empty IDataView.
var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);
// A pipeline for converting text into vector of words.
// The following call to 'TokenizeIntoWords' tokenizes text/string into
// words using space as a separator. Space is also a default value for
// the 'separators' argument if it is not specified.
var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Words",
"Text", separators: new[] { ' ' });
// Fit to data.
var textTransformer = textPipeline.Fit(emptyDataView);
// Create the prediction engine to get the word vector from the input
// text /string.
var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
TransformedTextData>(textTransformer);
// Call the prediction API to convert the text into words.
var data = new TextData()
{
Text = "ML.NET's TokenizeIntoWords API " +
"splits text/string into words using the list of characters " +
"provided as separators."
};
var prediction = predictionEngine.Predict(data);
// Print the length of the word vector.
Console.WriteLine($"Number of words: {prediction.Words.Length}");
// Print the word vector.
Console.WriteLine($"\nWords: {string.Join(",", prediction.Words)}");
// Expected output:
// Number of words: 15
// Words: ML.NET's,TokenizeIntoWords,API,splits,text/string,into,words,using,the,list,of,characters,provided,as,separators.
}
private class TextData
{
public string Text { get; set; }
}
private class TransformedTextData : TextData
{
public string[] Words { get; set; }
}
}
}