Share via


NGramTokenizer Class

public final class NGramTokenizer
extends LexicalTokenizer

Tokenizes the input into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.

Constructor Summary

Constructor Description
NGramTokenizer(String name)

Creates an instance of NGramTokenizer class.

Method Summary

Modifier and Type Method and Description
static NGramTokenizer fromJson(JsonReader jsonReader)

Reads an instance of NGramTokenizer from the JsonReader.

Integer getMaxGram()

Get the maxGram property: The maximum n-gram length.

Integer getMinGram()

Get the minGram property: The minimum n-gram length.

String getOdataType()

Get the odataType property: A URI fragment specifying the type of tokenizer.

List<TokenCharacterKind> getTokenChars()

Get the tokenChars property: Character classes to keep in the tokens.

NGramTokenizer setMaxGram(Integer maxGram)

Set the maxGram property: The maximum n-gram length.

NGramTokenizer setMinGram(Integer minGram)

Set the minGram property: The minimum n-gram length.

NGramTokenizer setTokenChars(List<TokenCharacterKind> tokenChars)

Set the tokenChars property: Character classes to keep in the tokens.

NGramTokenizer setTokenChars(TokenCharacterKind[] tokenChars)

Set the tokenChars property: Character classes to keep in the tokens.

JsonWriter toJson(JsonWriter jsonWriter)

Methods inherited from LexicalTokenizer

Methods inherited from java.lang.Object

Constructor Details

NGramTokenizer

public NGramTokenizer(String name)

Creates an instance of NGramTokenizer class.

Parameters:

name - the name value to set.

Method Details

fromJson

public static NGramTokenizer fromJson(JsonReader jsonReader)

Reads an instance of NGramTokenizer from the JsonReader.

Parameters:

jsonReader - The JsonReader being read.

Returns:

An instance of NGramTokenizer if the JsonReader was pointing to an instance of it, or null if it was pointing to JSON null.

Throws:

IOException

- If the deserialized JSON object was missing any required properties.

getMaxGram

public Integer getMaxGram()

Get the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.

Returns:

the maxGram value.

getMinGram

public Integer getMinGram()

Get the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

Returns:

the minGram value.

getOdataType

public String getOdataType()

Get the odataType property: A URI fragment specifying the type of tokenizer.

Overrides:

NGramTokenizer.getOdataType()

Returns:

the odataType value.

getTokenChars

public List<TokenCharacterKind> getTokenChars()

Get the tokenChars property: Character classes to keep in the tokens.

Returns:

the tokenChars value.

setMaxGram

public NGramTokenizer setMaxGram(Integer maxGram)

Set the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.

Parameters:

maxGram - the maxGram value to set.

Returns:

the NGramTokenizer object itself.

setMinGram

public NGramTokenizer setMinGram(Integer minGram)

Set the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.

Parameters:

minGram - the minGram value to set.

Returns:

the NGramTokenizer object itself.

setTokenChars

public NGramTokenizer setTokenChars(List<TokenCharacterKind> tokenChars)

Set the tokenChars property: Character classes to keep in the tokens.

Parameters:

tokenChars - the tokenChars value to set.

Returns:

the NGramTokenizer object itself.

setTokenChars

public NGramTokenizer setTokenChars(TokenCharacterKind[] tokenChars)

Set the tokenChars property: Character classes to keep in the tokens.

Parameters:

tokenChars - the tokenChars value to set.

Returns:

the NGramTokenizer object itself.

toJson

public JsonWriter toJson(JsonWriter jsonWriter)

Overrides:

NGramTokenizer.toJson(JsonWriter jsonWriter)

Parameters:

jsonWriter

Throws:

Applies to