Share via


EnglishRoberta Class

Definition

Represent the Byte Pair Encoding model.

public sealed class EnglishRoberta : Microsoft.ML.Tokenizers.Model
type EnglishRoberta = class
    inherit Model
Public NotInheritable Class EnglishRoberta
Inherits Model
Inheritance
EnglishRoberta

Constructors

EnglishRoberta(Stream, Stream, Stream)

Construct tokenizer object to use with the English Robert model.

EnglishRoberta(String, String, String)

Construct tokenizer object to use with the English Robert model.

Properties

PadIndex

Gets the index of the pad symbol inside the symbols list.

SymbolsCount

Gets the symbols list length.

Methods

AddMaskSymbol(String)

Add the mask symbol to the symbols list.

GetTrainer()

Gets a trainer object to use in training the model and generate the vocabulary and merges data.

GetVocab()

Gets the dictionary mapping tokens to Ids.

GetVocabSize()

Gets the dictionary size that map tokens to Ids.

IdsToOccurrenceRanks(IReadOnlyList<Int32>)

Convert a list of tokens Ids to highest occurrence rankings.

IdsToOccurrenceValues(IReadOnlyList<Int32>)

Convert a list of tokens Ids to highest occurrence values.

IdToString(Int32, Boolean)

Map the tokenized Id to the original string.

IdToToken(Int32, Boolean)

Map the tokenized Id to the token.

IsValidChar(Char)
OccurrenceRanksIds(IReadOnlyList<Int32>)

Convert a list of highest occurrence rankings to tokens Ids list .

Save(String, String)

Save the model data into the vocabulary, merges, and occurrence mapping files.

Tokenize(String)

Tokenize a sequence string to a list of tokens.

TokenToId(String)

Map the token to tokenized Id.

Applies to