SplitSkill interface
A skill to split a string into chunks of text.
- Extends
Properties
| azure |
Only applies if the unit is set to azureOpenAITokens. If specified, the splitSkill will use these parameters when performing the tokenization. The parameters are a valid 'encoderModelName' and an optional 'allowedSpecialTokens' property. |
| default |
A value indicating which language code to use. Default is |
| maximum |
Only applicable when textSplitMode is set to 'pages'. If specified, the SplitSkill will discontinue splitting after processing the first 'maximumPagesToTake' pages, in order to improve performance when only a few initial pages are needed from each document. |
| max |
The desired maximum page length. Default is 10000. |
| odatatype | Polymorphic discriminator, which specifies the different types this object can be |
| page |
Only applicable when textSplitMode is set to 'pages'. If specified, n+1th chunk will start with this number of characters/tokens from the end of the nth chunk. |
| text |
A value indicating which split mode to perform. |
| unit | Only applies if textSplitMode is set to pages. There are two possible values. The choice of the values will decide the length (maximumPageLength and pageOverlapLength) measurement. The default is 'characters', which means the length will be measured by character. |
Inherited Properties
| context | Represents the level at which operations take place, such as the document root or document content (for example, /document or /document/content). The default is /document. |
| description | The description of the skill which describes the inputs, outputs, and usage of the skill. |
| inputs | Inputs of the skills could be a column in the source data set, or the output of an upstream skill. |
| name | The name of the skill which uniquely identifies it within the skillset. A skill with no name defined will be given a default name of its 1-based index in the skills array, prefixed with the character '#'. |
| outputs | The output of a skill is either a field in a search index, or a value that can be consumed as an input by another skill. |
Property Details
azureOpenAITokenizerParameters
Only applies if the unit is set to azureOpenAITokens. If specified, the splitSkill will use these parameters when performing the tokenization. The parameters are a valid 'encoderModelName' and an optional 'allowedSpecialTokens' property.
azureOpenAITokenizerParameters?: AzureOpenAITokenizerParameters
Property Value
defaultLanguageCode
A value indicating which language code to use. Default is en.
defaultLanguageCode?: "da" | "de" | "en" | "es" | "fi" | "fr" | "it" | "ko" | "pt" | "cs" | "nl" | "hu" | "ja" | "pl" | "ru" | "sv" | "tr" | "bs" | "et" | "he" | "hi" | "hr" | "id" | "lv" | "nb" | "sk" | "sl" | "zh" | "is" | "sr" | "ur" | "am" | "pt-br"
Property Value
"da" | "de" | "en" | "es" | "fi" | "fr" | "it" | "ko" | "pt" | "cs" | "nl" | "hu" | "ja" | "pl" | "ru" | "sv" | "tr" | "bs" | "et" | "he" | "hi" | "hr" | "id" | "lv" | "nb" | "sk" | "sl" | "zh" | "is" | "sr" | "ur" | "am" | "pt-br"
maximumPagesToTake
Only applicable when textSplitMode is set to 'pages'. If specified, the SplitSkill will discontinue splitting after processing the first 'maximumPagesToTake' pages, in order to improve performance when only a few initial pages are needed from each document.
maximumPagesToTake?: number
Property Value
number
maxPageLength
The desired maximum page length. Default is 10000.
maxPageLength?: number
Property Value
number
odatatype
Polymorphic discriminator, which specifies the different types this object can be
odatatype: "#Microsoft.Skills.Text.SplitSkill"
Property Value
"#Microsoft.Skills.Text.SplitSkill"
pageOverlapLength
Only applicable when textSplitMode is set to 'pages'. If specified, n+1th chunk will start with this number of characters/tokens from the end of the nth chunk.
pageOverlapLength?: number
Property Value
number
textSplitMode
A value indicating which split mode to perform.
textSplitMode?: "pages" | "sentences"
Property Value
"pages" | "sentences"
unit
Only applies if textSplitMode is set to pages. There are two possible values. The choice of the values will decide the length (maximumPageLength and pageOverlapLength) measurement. The default is 'characters', which means the length will be measured by character.
unit?: string
Property Value
string
Inherited Property Details
context
Represents the level at which operations take place, such as the document root or document content (for example, /document or /document/content). The default is /document.
context?: string
Property Value
string
Inherited From SearchIndexerSkill.context
description
The description of the skill which describes the inputs, outputs, and usage of the skill.
description?: string
Property Value
string
Inherited From SearchIndexerSkill.description
inputs
Inputs of the skills could be a column in the source data set, or the output of an upstream skill.
inputs: InputFieldMappingEntry[]
Property Value
Inherited From SearchIndexerSkill.inputs
name
The name of the skill which uniquely identifies it within the skillset. A skill with no name defined will be given a default name of its 1-based index in the skills array, prefixed with the character '#'.
name?: string
Property Value
string
Inherited From SearchIndexerSkill.name
outputs
The output of a skill is either a field in a search index, or a value that can be consumed as an input by another skill.
outputs: OutputFieldMappingEntry[]
Property Value
Inherited From SearchIndexerSkill.outputs