Share via


Tokenizer Class

Definition

A Tokenizer works as a pipeline. It processes some raw text as input and outputs a TokenizerResult object.

public class Tokenizer
type Tokenizer = class
Public Class Tokenizer
Inheritance
Tokenizer

Constructors

Tokenizer(Model, PreTokenizer, Normalizer)

Create a new Tokenizer object.

Properties

Decoder

Gets or sets the Decoder in use by the Tokenizer.

Model

Gets the Model in use by the Tokenizer.

Normalizer

Gets or sets the Normalizer in use by the Tokenizer.

PreTokenizer

Gets or sets the PreTokenizer used by the Tokenizer.

Methods

Decode(IEnumerable<Int32>, Boolean)

Decode the given ids, back to a String.

Decode(Int32, Boolean)

Decodes the Id to the mapped token.

Encode(String)

Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping.

IsValidChar(Char)
TrainFromFiles(Trainer, ReportProgress, String[])

Train the tokenizer model using input files.

Applies to