SplitSkill interface

Package:: @azure/search-documents

A skill to split a string into chunks of text.

Extends: BaseSearchIndexerSkill

Properties

azureOpenAITokenizerParameters	Only applies if the unit is set to azureOpenAITokens. If specified, the splitSkill will use these parameters when performing the tokenization. The parameters are a valid 'encoderModelName' and an optional 'allowedSpecialTokens' property.
defaultLanguageCode	A value indicating which language code to use. Default is `en`.
maximumPagesToTake	Only applicable when textSplitMode is set to 'pages'. If specified, the SplitSkill will discontinue splitting after processing the first 'maximumPagesToTake' pages, in order to improve performance when only a few initial pages are needed from each document.
maxPageLength	The desired maximum page length. Default is 10000.
odatatype	Polymorphic discriminator, which specifies the different types this object can be
pageOverlapLength	Only applicable when textSplitMode is set to 'pages'. If specified, n+1th chunk will start with this number of characters/tokens from the end of the nth chunk.
textSplitMode	A value indicating which split mode to perform.
unit	Only applies if textSplitMode is set to pages. There are two possible values. The choice of the values will decide the length (maximumPageLength and pageOverlapLength) measurement. The default is 'characters', which means the length will be measured by character.

Inherited Properties

context	Represents the level at which operations take place, such as the document root or document content (for example, /document or /document/content). The default is /document.
description	The description of the skill which describes the inputs, outputs, and usage of the skill.
inputs	Inputs of the skills could be a column in the source data set, or the output of an upstream skill.
name	The name of the skill which uniquely identifies it within the skillset. A skill with no name defined will be given a default name of its 1-based index in the skills array, prefixed with the character '#'.
outputs	The output of a skill is either a field in a search index, or a value that can be consumed as an input by another skill.

Property Details

azureOpenAITokenizerParameters

Only applies if the unit is set to azureOpenAITokens. If specified, the splitSkill will use these parameters when performing the tokenization. The parameters are a valid 'encoderModelName' and an optional 'allowedSpecialTokens' property.

azureOpenAITokenizerParameters?: AzureOpenAITokenizerParameters

Property Value

AzureOpenAITokenizerParameters

defaultLanguageCode

A value indicating which language code to use. Default is en.

defaultLanguageCode?: "da" | "de" | "en" | "es" | "fi" | "fr" | "it" | "ko" | "pt" | "cs" | "nl" | "hu" | "ja" | "pl" | "ru" | "sv" | "tr" | "bs" | "et" | "he" | "hi" | "hr" | "id" | "lv" | "nb" | "sk" | "sl" | "zh" | "is" | "sr" | "ur" | "am" | "pt-br"

Property Value

"da" | "de" | "en" | "es" | "fi" | "fr" | "it" | "ko" | "pt" | "cs" | "nl" | "hu" | "ja" | "pl" | "ru" | "sv" | "tr" | "bs" | "et" | "he" | "hi" | "hr" | "id" | "lv" | "nb" | "sk" | "sl" | "zh" | "is" | "sr" | "ur" | "am" | "pt-br"

maximumPagesToTake

Only applicable when textSplitMode is set to 'pages'. If specified, the SplitSkill will discontinue splitting after processing the first 'maximumPagesToTake' pages, in order to improve performance when only a few initial pages are needed from each document.

maximumPagesToTake?: number

Property Value

number

maxPageLength

The desired maximum page length. Default is 10000.

maxPageLength?: number

Property Value

number

odatatype

Polymorphic discriminator, which specifies the different types this object can be

odatatype: "#Microsoft.Skills.Text.SplitSkill"

Property Value

"#Microsoft.Skills.Text.SplitSkill"

pageOverlapLength

Only applicable when textSplitMode is set to 'pages'. If specified, n+1th chunk will start with this number of characters/tokens from the end of the nth chunk.

pageOverlapLength?: number

Property Value

number

textSplitMode

A value indicating which split mode to perform.

textSplitMode?: "pages" | "sentences"

Property Value

"pages" | "sentences"

unit

Only applies if textSplitMode is set to pages. There are two possible values. The choice of the values will decide the length (maximumPageLength and pageOverlapLength) measurement. The default is 'characters', which means the length will be measured by character.

unit?: string

Property Value

string

Inherited Property Details

context

Represents the level at which operations take place, such as the document root or document content (for example, /document or /document/content). The default is /document.

context?: string

Property Value

string

Inherited From SearchIndexerSkill.context

description

The description of the skill which describes the inputs, outputs, and usage of the skill.

description?: string

Property Value

string

Inherited From SearchIndexerSkill.description

inputs

Inputs of the skills could be a column in the source data set, or the output of an upstream skill.

inputs: InputFieldMappingEntry[]

Property Value

InputFieldMappingEntry[]

Inherited From SearchIndexerSkill.inputs

name

The name of the skill which uniquely identifies it within the skillset. A skill with no name defined will be given a default name of its 1-based index in the skills array, prefixed with the character '#'.

name?: string

Property Value

string

Inherited From SearchIndexerSkill.name

outputs

The output of a skill is either a field in a search index, or a value that can be consumed as an input by another skill.

outputs: OutputFieldMappingEntry[]

Property Value

OutputFieldMappingEntry[]

Inherited From SearchIndexerSkill.outputs

Feedback

Was this page helpful?

Share via

SplitSkill interface

Properties

Inherited Properties

Property Details

azureOpenAITokenizerParameters

Property Value

defaultLanguageCode

Property Value

maximumPagesToTake

Property Value

maxPageLength

Property Value

odatatype

Property Value

pageOverlapLength

Property Value

textSplitMode

Property Value

unit

Property Value

Inherited Property Details

context

Property Value

description

Property Value

inputs

Property Value

name

Property Value

outputs

Property Value

Feedback