TextCatalog.RemoveStopWords 메서드
정의
중요
일부 정보는 릴리스되기 전에 상당 부분 수정될 수 있는 시험판 제품과 관련이 있습니다. Microsoft는 여기에 제공된 정보에 대해 어떠한 명시적이거나 묵시적인 보증도 하지 않습니다.
CustomStopWordsRemovingEstimator새 열에 지정된 inputColumnName
열의 데이터를 복사하고 해당 열 outputColumnName
에서 지정된 텍스트를 제거하는 를 stopwords
만듭니다.
public static Microsoft.ML.Transforms.Text.CustomStopWordsRemovingEstimator RemoveStopWords (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, params string[] stopwords);
static member RemoveStopWords : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * string[] -> Microsoft.ML.Transforms.Text.CustomStopWordsRemovingEstimator
<Extension()>
Public Function RemoveStopWords (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, ParamArray stopwords As String()) As CustomStopWordsRemovingEstimator
매개 변수
- catalog
- TransformsCatalog.TextTransforms
변환의 카탈로그입니다.
- outputColumnName
- String
의 변환에서 생성된 열의 inputColumnName
이름입니다.
이 열의 데이터 형식은 텍스트의 가변 크기 벡터입니다.
- inputColumnName
- String
데이터를 복사할 열의 이름입니다. 이 추정기는 텍스트 벡터에서 작동합니다.
- stopwords
- String[]
제거할 단어의 배열입니다.
반환
예제
using System;
using System.Collections.Generic;
using Microsoft.ML;
namespace Samples.Dynamic
{
public static class RemoveStopWords
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for
// exception tracking and logging, as well as the source of randomness.
var mlContext = new MLContext();
// Create an empty list as the dataset. The 'RemoveStopWords' does not
// require training data as the estimator
// ('CustomStopWordsRemovingEstimator') created by 'RemoveStopWords' API
// is not a trainable estimator. The empty list is only needed to pass
// input schema to the pipeline.
var emptySamples = new List<TextData>();
// Convert sample list to an empty IDataView.
var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);
// A pipeline for removing stop words from input text/string.
// The pipeline first tokenizes text into words then removes stop words.
// The 'RemoveStopWords' API ignores casing of the text/string e.g.
// 'tHe' and 'the' are considered the same stop words.
var textPipeline = mlContext.Transforms.Text.TokenizeIntoWords("Words",
"Text")
.Append(mlContext.Transforms.Text.RemoveStopWords(
"WordsWithoutStopWords", "Words", stopwords:
new[] { "a", "the", "from", "by" }));
// Fit to data.
var textTransformer = textPipeline.Fit(emptyDataView);
// Create the prediction engine to remove the stop words from the input
// text /string.
var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
TransformedTextData>(textTransformer);
// Call the prediction API to remove stop words.
var data = new TextData()
{
Text = "ML.NET's RemoveStopWords API " +
"removes stop words from tHe text/string using a list of stop " +
"words provided by the user."
};
var prediction = predictionEngine.Predict(data);
// Print the length of the word vector after the stop words removed.
Console.WriteLine("Number of words: " + prediction.WordsWithoutStopWords
.Length);
// Print the word vector without stop words.
Console.WriteLine("\nWords without stop words: " + string.Join(",",
prediction.WordsWithoutStopWords));
// Expected output:
// Number of words: 14
// Words without stop words: ML.NET's,RemoveStopWords,API,removes,stop,words,text/string,using,list,of,stop,words,provided,user.
}
private class TextData
{
public string Text { get; set; }
}
private class TransformedTextData : TextData
{
public string[] WordsWithoutStopWords { get; set; }
}
}
}