Bpe Class

Definition

Represent the Byte Pair Encoding model.

public sealed class Bpe : Microsoft.ML.Tokenizers.Model
type Bpe = class
    inherit Model
Public NotInheritable Class Bpe
Inherits Model
Inheritance

Constructors

Bpe()

Construct a new Bpe model object with no tokenization vocabulary. This constructor is useful only in the training scenario.

Bpe(String, String, String, String, String)

Construct a new Bpe model object to use for sentence tokenization and tokenizer training.

Properties

ContinuingSubwordPrefix

An optional prefix to use on any sub-word that exist only behind another one

Decoder

Gets the Bpe decoder object.

EndOfWordSuffix

An optional suffix to characterize and end-of-word sub-word

FuseUnknownTokens

Gets or sets whether allowing multiple unknown tokens get fused

UnknownToken

Gets or Sets unknown token. The unknown token to be used when we encounter an unknown char

Methods

GetTrainer()

Gets a trainer object to use in training the model and generate the vocabulary and merges data.

GetVocab()

Gets the dictionary mapping tokens to Ids.

GetVocabSize()

Gets the dictionary size that map tokens to Ids.

IdToString(Int32, Boolean)

Map the tokenized Id to the token.

IdToToken(Int32, Boolean)

Map the tokenized Id to the token.

IsValidChar(Char)
Save(String, String)

Save the model data into the vocabulary and merges files.

Tokenize(String)

Tokenize a sequence string to a list of tokens.

TokenToId(String)

Map the token to tokenized Id.

Applies to