TextCatalog.NormalizeText 方法

定义

创建一个TextNormalizingEstimator,它通过根据需要更改大小写来规范传入文本inputColumnName,删除音调标记、标点符号、数字和输出新文本。outputColumnName

public static Microsoft.ML.Transforms.Text.TextNormalizingEstimator NormalizeText (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, Microsoft.ML.Transforms.Text.TextNormalizingEstimator.CaseMode caseMode = Microsoft.ML.Transforms.Text.TextNormalizingEstimator+CaseMode.Lower, bool keepDiacritics = false, bool keepPunctuations = true, bool keepNumbers = true);
static member NormalizeText : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * Microsoft.ML.Transforms.Text.TextNormalizingEstimator.CaseMode * bool * bool * bool -> Microsoft.ML.Transforms.Text.TextNormalizingEstimator
<Extension()>
Public Function NormalizeText (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional caseMode As TextNormalizingEstimator.CaseMode = Microsoft.ML.Transforms.Text.TextNormalizingEstimator+CaseMode.Lower, Optional keepDiacritics As Boolean = false, Optional keepPunctuations As Boolean = true, Optional keepNumbers As Boolean = true) As TextNormalizingEstimator

参数

catalog
TransformsCatalog.TextTransforms

与文本相关的转换的目录。

outputColumnName
String

由转换 inputColumnName生成的列的名称。 此列的数据类型是一个标量或文本矢量,具体取决于输入列数据类型。

inputColumnName
String

要转换的列的名称。 If set to null, the value of the outputColumnName will be used as source. 此估算器对文本或文本数据类型的向量进行操作。

caseMode
TextNormalizingEstimator.CaseMode

使用固定区域性规则的大小写文本。

keepDiacritics
Boolean

是保留音调标记还是删除它们。

keepPunctuations
Boolean

是保留标点符号还是删除标点符号。

keepNumbers
Boolean

是保留数字还是将其删除。

返回

示例

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Transforms.Text;

namespace Samples.Dynamic
{
    public static class NormalizeText
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext();

            // Create an empty list as the dataset. The 'NormalizeText' API does not
            // require training data as the estimator ('TextNormalizingEstimator')
            // created by 'NormalizeText' API is not a trainable estimator. The
            // empty list is only needed to pass input schema to the pipeline.
            var emptySamples = new List<TextData>();

            // Convert sample list to an empty IDataView.
            var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);

            // A pipeline for normalizing text.
            var normTextPipeline = mlContext.Transforms.Text.NormalizeText(
                "NormalizedText", "Text", TextNormalizingEstimator.CaseMode.Lower,
                keepDiacritics: false,
                keepPunctuations: false,
                keepNumbers: false);

            // Fit to data.
            var normTextTransformer = normTextPipeline.Fit(emptyDataView);

            // Create the prediction engine to get the normalized text from the
            // input text/string.
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
                TransformedTextData>(normTextTransformer);

            // Call the prediction API.
            var data = new TextData()
            {
                Text = "ML.NET's NormalizeText API " +
                "changes the case of the TEXT and removes/keeps diâcrîtîcs, " +
                "punctuations, and/or numbers (123)."
            };

            var prediction = predictionEngine.Predict(data);

            // Print the normalized text.
            Console.WriteLine($"Normalized Text: {prediction.NormalizedText}");

            //  Expected output:
            //   Normalized Text: mlnets normalizetext api changes the case of the text and removeskeeps diacritics punctuations andor numbers
        }

        private class TextData
        {
            public string Text { get; set; }
        }

        private class TransformedTextData : TextData
        {
            public string NormalizedText { get; set; }
        }
    }
}

适用于