Indexes - Analyze
Shows how an analyzer breaks text into tokens.
POST {endpoint}/indexes('{indexName}')/search.analyze?api-version=2024-07-01
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
The endpoint URL of the search service. |
index
|
path | True |
string |
The name of the index for which to test an analyzer. |
api-version
|
query | True |
string |
Client Api Version. |
Request Header
Name | Required | Type | Description |
---|---|---|---|
x-ms-client-request-id |
string (uuid) |
The tracking ID sent with the request to help with debugging. |
Request Body
Name | Required | Type | Description |
---|---|---|---|
text | True |
string |
The text to break into tokens. |
analyzer |
The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. |
||
charFilters |
An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
||
tokenFilters |
An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
||
tokenizer |
The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. |
Responses
Name | Type | Description |
---|---|---|
200 OK | ||
Other Status Codes |
Error response. |
Examples
SearchServiceIndexAnalyze
Sample request
POST https://myservice.search.windows.net/indexes('hotels')/search.analyze?api-version=2024-07-01
{
"text": "Text to analyze",
"analyzer": "standard.lucene"
}
Sample response
{
"tokens": [
{
"token": "text",
"startOffset": 0,
"endOffset": 4,
"position": 0
},
{
"token": "to",
"startOffset": 5,
"endOffset": 7,
"position": 1
},
{
"token": "analyze",
"startOffset": 8,
"endOffset": 15,
"position": 2
}
]
}
Definitions
Name | Description |
---|---|
Analyzed |
Information about a token returned by an analyzer. |
Analyze |
Specifies some text and analysis components used to break that text into tokens. |
Analyze |
The result of testing an analyzer on text. |
Char |
Defines the names of all character filters supported by the search engine. |
Error |
The resource management error additional info. |
Error |
The error detail. |
Error |
Error response |
Lexical |
Defines the names of all text analyzers supported by the search engine. |
Lexical |
Defines the names of all tokenizers supported by the search engine. |
Token |
Defines the names of all token filters supported by the search engine. |
AnalyzedTokenInfo
Information about a token returned by an analyzer.
Name | Type | Description |
---|---|---|
endOffset |
integer (int32) |
The index of the last character of the token in the input text. |
position |
integer (int32) |
The position of the token in the input text relative to other tokens. The first token in the input text has position 0, the next has position 1, and so on. Depending on the analyzer used, some tokens might have the same position, for example if they are synonyms of each other. |
startOffset |
integer (int32) |
The index of the first character of the token in the input text. |
token |
string |
The token returned by the analyzer. |
AnalyzeRequest
Specifies some text and analysis components used to break that text into tokens.
Name | Type | Description |
---|---|---|
analyzer |
The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. |
|
charFilters |
An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
|
text |
string |
The text to break into tokens. |
tokenFilters |
An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
|
tokenizer |
The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. |
AnalyzeResult
The result of testing an analyzer on text.
Name | Type | Description |
---|---|---|
tokens |
The list of tokens returned by the analyzer specified in the request. |
CharFilterName
Defines the names of all character filters supported by the search engine.
Value | Description |
---|---|
html_strip |
A character filter that attempts to strip out HTML constructs. See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.html |
ErrorAdditionalInfo
The resource management error additional info.
Name | Type | Description |
---|---|---|
info |
object |
The additional info. |
type |
string |
The additional info type. |
ErrorDetail
The error detail.
Name | Type | Description |
---|---|---|
additionalInfo |
The error additional info. |
|
code |
string |
The error code. |
details |
The error details. |
|
message |
string |
The error message. |
target |
string |
The error target. |
ErrorResponse
Error response
Name | Type | Description |
---|---|---|
error |
The error object. |
LexicalAnalyzerName
Defines the names of all text analyzers supported by the search engine.
Value | Description |
---|---|
ar.microsoft |
Microsoft analyzer for Arabic. |
ar.lucene |
Lucene analyzer for Arabic. |
hy.lucene |
Lucene analyzer for Armenian. |
bn.microsoft |
Microsoft analyzer for Bangla. |
eu.lucene |
Lucene analyzer for Basque. |
bg.microsoft |
Microsoft analyzer for Bulgarian. |
bg.lucene |
Lucene analyzer for Bulgarian. |
ca.microsoft |
Microsoft analyzer for Catalan. |
ca.lucene |
Lucene analyzer for Catalan. |
zh-Hans.microsoft |
Microsoft analyzer for Chinese (Simplified). |
zh-Hans.lucene |
Lucene analyzer for Chinese (Simplified). |
zh-Hant.microsoft |
Microsoft analyzer for Chinese (Traditional). |
zh-Hant.lucene |
Lucene analyzer for Chinese (Traditional). |
hr.microsoft |
Microsoft analyzer for Croatian. |
cs.microsoft |
Microsoft analyzer for Czech. |
cs.lucene |
Lucene analyzer for Czech. |
da.microsoft |
Microsoft analyzer for Danish. |
da.lucene |
Lucene analyzer for Danish. |
nl.microsoft |
Microsoft analyzer for Dutch. |
nl.lucene |
Lucene analyzer for Dutch. |
en.microsoft |
Microsoft analyzer for English. |
en.lucene |
Lucene analyzer for English. |
et.microsoft |
Microsoft analyzer for Estonian. |
fi.microsoft |
Microsoft analyzer for Finnish. |
fi.lucene |
Lucene analyzer for Finnish. |
fr.microsoft |
Microsoft analyzer for French. |
fr.lucene |
Lucene analyzer for French. |
gl.lucene |
Lucene analyzer for Galician. |
de.microsoft |
Microsoft analyzer for German. |
de.lucene |
Lucene analyzer for German. |
el.microsoft |
Microsoft analyzer for Greek. |
el.lucene |
Lucene analyzer for Greek. |
gu.microsoft |
Microsoft analyzer for Gujarati. |
he.microsoft |
Microsoft analyzer for Hebrew. |
hi.microsoft |
Microsoft analyzer for Hindi. |
hi.lucene |
Lucene analyzer for Hindi. |
hu.microsoft |
Microsoft analyzer for Hungarian. |
hu.lucene |
Lucene analyzer for Hungarian. |
is.microsoft |
Microsoft analyzer for Icelandic. |
id.microsoft |
Microsoft analyzer for Indonesian (Bahasa). |
id.lucene |
Lucene analyzer for Indonesian. |
ga.lucene |
Lucene analyzer for Irish. |
it.microsoft |
Microsoft analyzer for Italian. |
it.lucene |
Lucene analyzer for Italian. |
ja.microsoft |
Microsoft analyzer for Japanese. |
ja.lucene |
Lucene analyzer for Japanese. |
kn.microsoft |
Microsoft analyzer for Kannada. |
ko.microsoft |
Microsoft analyzer for Korean. |
ko.lucene |
Lucene analyzer for Korean. |
lv.microsoft |
Microsoft analyzer for Latvian. |
lv.lucene |
Lucene analyzer for Latvian. |
lt.microsoft |
Microsoft analyzer for Lithuanian. |
ml.microsoft |
Microsoft analyzer for Malayalam. |
ms.microsoft |
Microsoft analyzer for Malay (Latin). |
mr.microsoft |
Microsoft analyzer for Marathi. |
nb.microsoft |
Microsoft analyzer for Norwegian (Bokmål). |
no.lucene |
Lucene analyzer for Norwegian. |
fa.lucene |
Lucene analyzer for Persian. |
pl.microsoft |
Microsoft analyzer for Polish. |
pl.lucene |
Lucene analyzer for Polish. |
pt-BR.microsoft |
Microsoft analyzer for Portuguese (Brazil). |
pt-BR.lucene |
Lucene analyzer for Portuguese (Brazil). |
pt-PT.microsoft |
Microsoft analyzer for Portuguese (Portugal). |
pt-PT.lucene |
Lucene analyzer for Portuguese (Portugal). |
pa.microsoft |
Microsoft analyzer for Punjabi. |
ro.microsoft |
Microsoft analyzer for Romanian. |
ro.lucene |
Lucene analyzer for Romanian. |
ru.microsoft |
Microsoft analyzer for Russian. |
ru.lucene |
Lucene analyzer for Russian. |
sr-cyrillic.microsoft |
Microsoft analyzer for Serbian (Cyrillic). |
sr-latin.microsoft |
Microsoft analyzer for Serbian (Latin). |
sk.microsoft |
Microsoft analyzer for Slovak. |
sl.microsoft |
Microsoft analyzer for Slovenian. |
es.microsoft |
Microsoft analyzer for Spanish. |
es.lucene |
Lucene analyzer for Spanish. |
sv.microsoft |
Microsoft analyzer for Swedish. |
sv.lucene |
Lucene analyzer for Swedish. |
ta.microsoft |
Microsoft analyzer for Tamil. |
te.microsoft |
Microsoft analyzer for Telugu. |
th.microsoft |
Microsoft analyzer for Thai. |
th.lucene |
Lucene analyzer for Thai. |
tr.microsoft |
Microsoft analyzer for Turkish. |
tr.lucene |
Lucene analyzer for Turkish. |
uk.microsoft |
Microsoft analyzer for Ukrainian. |
ur.microsoft |
Microsoft analyzer for Urdu. |
vi.microsoft |
Microsoft analyzer for Vietnamese. |
standard.lucene |
Standard Lucene analyzer. |
standardasciifolding.lucene |
Standard ASCII Folding Lucene analyzer. See https://learn.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#Analyzers |
keyword |
Treats the entire content of a field as a single token. This is useful for data like zip codes, ids, and some product names. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html |
pattern |
Flexibly separates text into terms via a regular expression pattern. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/PatternAnalyzer.html |
simple |
Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html |
stop |
Divides text at non-letters; Applies the lowercase and stopword token filters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopAnalyzer.html |
whitespace |
An analyzer that uses the whitespace tokenizer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html |
LexicalTokenizerName
Defines the names of all tokenizers supported by the search engine.
TokenFilterName
Defines the names of all token filters supported by the search engine.