TextCatalog.TokenizeIntoWords Metode
Definisi
Penting
Beberapa informasi terkait produk prarilis yang dapat diubah secara signifikan sebelum dirilis. Microsoft tidak memberikan jaminan, tersirat maupun tersurat, sehubungan dengan informasi yang diberikan di sini.
Buat WordTokenizingEstimator, yang membuat tokenisasi teks input menggunakan separators
sebagai pemisah.
public static Microsoft.ML.Transforms.Text.WordTokenizingEstimator TokenizeIntoWords (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, char[] separators = default);
static member TokenizeIntoWords : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * char[] -> Microsoft.ML.Transforms.Text.WordTokenizingEstimator
<Extension()>
Public Function TokenizeIntoWords (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional separators As Char() = Nothing) As WordTokenizingEstimator
Parameter
- catalog
- TransformsCatalog.TextTransforms
Katalog transformasi terkait teks.
- outputColumnName
- String
Nama kolom yang dihasilkan dari transformasi inputColumnName
.
Jenis data kolom ini akan menjadi vektor ukuran variabel teks.
- inputColumnName
- String
Nama kolom yang akan diubah. Jika diatur ke null
, nilai outputColumnName
akan digunakan sebagai sumber.
Estimator ini beroperasi pada skalar teks dan vektor jenis data teks.
- separators
- Char[]
Pemisah yang akan digunakan (menggunakan karakter spasi secara default).
Mengembalikan
Contoh
using System;
using System.Collections.Generic;
using Microsoft.ML;
namespace Samples.Dynamic
{
public static class TokenizeIntoWords
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext();
// Create an empty list as the dataset. The 'TokenizeIntoWords' does
// not require training data as the estimator
// ('WordTokenizingEstimator') created by 'TokenizeIntoWords' API is not
// a trainable estimator. The empty list is only needed to pass input
// schema to the pipeline.
var emptySamples = new List<TextData>();
// Convert sample list to an empty IDataView.
var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);
// A pipeline for converting text into vector of words.
// The following call to 'TokenizeIntoWords' tokenizes text/string into
// words using space as a separator. Space is also a default value for
// the 'separators' argument if it is not specified.
var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Words",
"Text", separators: new[] { ' ' });
// Fit to data.
var textTransformer = textPipeline.Fit(emptyDataView);
// Create the prediction engine to get the word vector from the input
// text /string.
var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
TransformedTextData>(textTransformer);
// Call the prediction API to convert the text into words.
var data = new TextData()
{
Text = "ML.NET's TokenizeIntoWords API " +
"splits text/string into words using the list of characters " +
"provided as separators."
};
var prediction = predictionEngine.Predict(data);
// Print the length of the word vector.
Console.WriteLine($"Number of words: {prediction.Words.Length}");
// Print the word vector.
Console.WriteLine($"\nWords: {string.Join(",", prediction.Words)}");
// Expected output:
// Number of words: 15
// Words: ML.NET's,TokenizeIntoWords,API,splits,text/string,into,words,using,the,list,of,characters,provided,as,separators.
}
private class TextData
{
public string Text { get; set; }
}
private class TransformedTextData : TextData
{
public string[] Words { get; set; }
}
}
}