Share via


BpeTrainer Class

Definition

The Bpe trainer responsible to train the Bpe model.

public sealed class BpeTrainer : Microsoft.ML.Tokenizers.Trainer
type BpeTrainer = class
    inherit Trainer
Public NotInheritable Class BpeTrainer
Inherits Trainer
Inheritance
BpeTrainer

Constructors

BpeTrainer()

Construct a new BpeTrainer object using the default values.

BpeTrainer(IEnumerable<AddedToken>, Int32, Int32, ReportProgress, Nullable<Int32>, HashSet<Char>, String, String)

Construct a new BpeTrainer object.

Properties

ContinuingSubwordPrefix

Gets the prefix to be used for every sub-word that is not a beginning-of-word.

EndOfWordSuffix

Gets the suffix to be used for every sub-word that is a end-of-word.

InitialAlphabet

Gets the list of characters to include in the initial alphabet, even if not seen in the training dataset. If the strings contain more than one character, only the first one is kept.

LimitAlphabet

Gets the maximum different characters to keep in the alphabet.

MinFrequency

Gets the minimum frequency a pair should have in order to be merged.

Progress

Set when need to report the progress during the training.

(Inherited from Trainer)
SpecialTokens

Gets the list of special tokens the model should know of.

VocabSize

Gets the size of the final vocabulary, including all tokens and alphabet.

Methods

Feed(IEnumerable<String>, Func<String,IEnumerable<String>>)

Process the input sequences and feed the result to the model.

Train(Model)

Perform the actual training and update the input model with the new vocabularies and merges data.

Applies to