Indexes - Analyze

參考

服務:: Search Service

API 版本:: 2024-07-01

顯示分析器如何將文字分成標記。

POST {endpoint}/indexes('{indexName}')/search.analyze?api-version=2024-07-01

URI 參數

名稱	位於	必要	類型	Description
endpoint	path	True	string	搜尋服務的端點 URL。
indexName	path	True	string	要測試分析器之索引的名稱。
api-version	query	True	string	用戶端 API 版本。

要求標頭

名稱	必要	類型	Description
x-ms-client-request-id		string uuid	隨要求一起傳送的追蹤標識碼，以協助偵錯。

要求本文

名稱	必要	類型	Description
text	True	string	要分成標記的文字。
analyzer		LexicalAnalyzerName	用來中斷指定文字的分析器名稱。如果未指定此參數，您必須改為指定Tokenizer。 Tokenizer 和分析器參數互斥。
charFilters		CharFilterName[]	中斷指定文字時要使用的字元篩選選擇性清單。只有在使用 Tokenizer 參數時，才能設定此參數。
tokenFilters		TokenFilterName[]	中斷指定文字時要使用的令牌篩選選擇性清單。只有在使用 Tokenizer 參數時，才能設定此參數。
tokenizer		LexicalTokenizerName	用來中斷指定文字之Tokenizer的名稱。如果未指定此參數，您必須改為指定分析器。 Tokenizer 和分析器參數互斥。

回應

名稱	類型	Description
200 OK	AnalyzeResult
Other Status Codes	ErrorResponse	錯誤回應。

範例

SearchServiceIndexAnalyze

範例要求

HTTP

POST https://myservice.search.windows.net/indexes('hotels')/search.analyze?api-version=2024-07-01

{
  "text": "Text to analyze",
  "analyzer": "standard.lucene"
}

範例回覆

狀態碼:: 200

{
  "tokens": [
    {
      "token": "text",
      "startOffset": 0,
      "endOffset": 4,
      "position": 0
    },
    {
      "token": "to",
      "startOffset": 5,
      "endOffset": 7,
      "position": 1
    },
    {
      "token": "analyze",
      "startOffset": 8,
      "endOffset": 15,
      "position": 2
    }
  ]
}

定義

名稱	Description
AnalyzedTokenInfo	分析器傳回之令牌的相關信息。
AnalyzeRequest	指定一些用來將文字分成標記的文字和分析元件。
AnalyzeResult	在文字上測試分析器的結果。
CharFilterName	定義搜尋引擎所支援之所有字元篩選的名稱。
ErrorAdditionalInfo	資源管理錯誤其他資訊。
ErrorDetail	錯誤詳細數據。
ErrorResponse	錯誤回應
LexicalAnalyzerName	定義搜尋引擎所支援之所有文字分析器的名稱。
LexicalTokenizerName	定義搜尋引擎支援的所有 Tokenizer 名稱。
TokenFilterName	定義搜尋引擎所支援之所有令牌篩選的名稱。

AnalyzedTokenInfo

分析器傳回之令牌的相關信息。

名稱	類型	Description
endOffset	integer	輸入文字中標記最後一個字元的索引。
position	integer	標記在輸入文字中相對於其他標記的位置。輸入文字中的第一個標記具有位置 0、下一個標記的位置 1 等等。根據所使用的分析器而定，某些令牌的位置可能相同，例如，如果它們彼此同義。
startOffset	integer	輸入文字中標記第一個字元的索引。
token	string	分析器傳回的令牌。

AnalyzeRequest

指定一些用來將文字分成標記的文字和分析元件。

名稱	類型	Description
analyzer	LexicalAnalyzerName	用來中斷指定文字的分析器名稱。如果未指定此參數，您必須改為指定Tokenizer。 Tokenizer 和分析器參數互斥。
charFilters	CharFilterName[]	中斷指定文字時要使用的字元篩選選擇性清單。只有在使用 Tokenizer 參數時，才能設定此參數。
text	string	要分成標記的文字。
tokenFilters	TokenFilterName[]	中斷指定文字時要使用的令牌篩選選擇性清單。只有在使用 Tokenizer 參數時，才能設定此參數。
tokenizer	LexicalTokenizerName	用來中斷指定文字之Tokenizer的名稱。如果未指定此參數，您必須改為指定分析器。 Tokenizer 和分析器參數互斥。

AnalyzeResult

在文字上測試分析器的結果。

名稱	類型	Description
tokens	AnalyzedTokenInfo[]	要求中指定的分析器所傳回的令牌清單。

CharFilterName

定義搜尋引擎所支援之所有字元篩選的名稱。

名稱	類型	Description
html_strip	string	嘗試去除 HTML 建構的字元篩選。請參閱 https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.html

ErrorAdditionalInfo

資源管理錯誤其他資訊。

名稱	類型	Description
info	object	其他資訊。
type	string	其他信息類型。

ErrorDetail

錯誤詳細數據。

名稱	類型	Description
additionalInfo	ErrorAdditionalInfo[]	錯誤其他資訊。
code	string	錯誤碼。
details	ErrorDetail[]	錯誤詳細數據。
message	string	錯誤訊息。
target	string	錯誤目標。

ErrorResponse

錯誤回應

名稱	類型	Description
error	ErrorDetail	error 物件。

LexicalAnalyzerName

定義搜尋引擎所支援之所有文字分析器的名稱。

名稱	類型	Description
ar.lucene	string	阿拉伯文的 Lucene 分析器。
ar.microsoft	string	Microsoft阿拉伯文的分析器。
bg.lucene	string	保加利亞的 Lucene 分析器。
bg.microsoft	string	保加利亞文Microsoft分析器。
bn.microsoft	string	班格拉的 Microsoft 分析器。
ca.lucene	string	加泰隆尼亞的 Lucene 分析器。
ca.microsoft	string	Microsoft加泰羅尼亞的分析器。
cs.lucene	string	捷克文的 Lucene 分析器。
cs.microsoft	string	Microsoft捷克文的分析器。
da.lucene	string	丹麥文的 Lucene 分析器。
da.microsoft	string	Microsoft丹麥文的分析器。
de.lucene	string	適用於德文的 Lucene 分析器。
de.microsoft	string	Microsoft適用於德文的分析器。
el.lucene	string	希臘文的 Lucene 分析器。
el.microsoft	string	適用於希臘文的Microsoft分析器。
en.lucene	string	適用於英文的 Lucene 分析器。
en.microsoft	string	Microsoft英文分析器。
es.lucene	string	適用於西班牙文的 Lucene 分析器。
es.microsoft	string	Microsoft西班牙文分析器。
et.microsoft	string	愛沙尼亞Microsoft分析器。
eu.lucene	string	Basque 的 Lucene 分析器。
fa.lucene	string	波斯文的 Lucene 分析器。
fi.lucene	string	芬蘭文的 Lucene 分析器。
fi.microsoft	string	芬蘭文Microsoft分析器。
fr.lucene	string	適用於法文的 Lucene 分析器。
fr.microsoft	string	適用於法文Microsoft分析器。
ga.lucene	string	愛爾蘭的 Lucene 分析器。
gl.lucene	string	加利西亞的 Lucene 分析器。
gu.microsoft	string	Microsoft古吉拉蒂的分析器。
he.microsoft	string	Microsoft希伯來文的分析器。
hi.lucene	string	適用於印度文的 Lucene 分析器。
hi.microsoft	string	適用於印度文Microsoft分析器。
hr.microsoft	string	克羅埃西亞Microsoft分析器。
hu.lucene	string	匈牙利文的 Lucene 分析器。
hu.microsoft	string	匈牙利文Microsoft分析器。
hy.lucene	string	亞美尼亞文的 Lucene 分析器。
id.lucene	string	印尼文的 Lucene 分析器。
id.microsoft	string	Microsoft印尼（巴薩）的分析器。
is.microsoft	string	冰島Microsoft分析器。
it.lucene	string	義大利文的 Lucene 分析器。
it.microsoft	string	義大利文Microsoft分析器。
ja.lucene	string	日文的 Lucene 分析器。
ja.microsoft	string	日文Microsoft分析器。
keyword	string	將欄位的整個內容視為單一標記。這適用於郵遞區號、標識元和某些產品名稱等數據。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html
kn.microsoft	string	適用於 Kannada 的Microsoft分析器。
ko.lucene	string	韓文的 Lucene 分析器。
ko.microsoft	string	適用於韓文Microsoft分析器。
lt.microsoft	string	立陶宛Microsoft分析器。
lv.lucene	string	拉脫維亞的 Lucene 分析器。
lv.microsoft	string	拉脫維亞Microsoft分析器。
ml.microsoft	string	馬來亞蘭Microsoft分析器。
mr.microsoft	string	Microsoft Marathi 的分析器。
ms.microsoft	string	馬來語（拉丁）的Microsoft分析器。
nb.microsoft	string	挪威文（博克瑪律）的Microsoft分析器。
nl.lucene	string	荷蘭文的 Lucene 分析器。
nl.microsoft	string	Microsoft荷蘭文的分析器。
no.lucene	string	挪威文的 Lucene 分析器。
pa.microsoft	string	Microsoft旁遮普的分析器。
pattern	string	彈性地透過正則表示式模式將文字分隔成字詞。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/PatternAnalyzer.html
pl.lucene	string	波蘭文的 Lucene 分析器。
pl.microsoft	string	Microsoft波蘭文的分析器。
pt-BR.lucene	string	葡萄牙文（巴西）的 Lucene 分析器。
pt-BR.microsoft	string	Microsoft葡萄牙文（巴西）的分析器。
pt-PT.lucene	string	葡萄牙文（葡萄牙）的 Lucene 分析器。
pt-PT.microsoft	string	葡萄牙文（葡萄牙）的Microsoft分析器。
ro.lucene	string	羅馬尼亞文的 Lucene 分析器。
ro.microsoft	string	Microsoft羅馬尼亞文的分析器。
ru.lucene	string	適用於俄羅斯的 Lucene 分析器。
ru.microsoft	string	適用於俄羅斯的Microsoft分析器。
simple	string	將文字分割成非字母，並將其轉換成小寫。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
sk.microsoft	string	斯洛伐克文Microsoft分析器。
sl.microsoft	string	斯洛維尼亞Microsoft分析器。
sr-cyrillic.microsoft	string	塞爾維亞文（斯拉夫）的Microsoft分析器。
sr-latin.microsoft	string	塞爾維亞文（拉丁文）的Microsoft分析器。
standard.lucene	string	標準 Lucene 分析器。
standardasciifolding.lucene	string	標準 ASCII 折疊 Lucene 分析器。請參閱 https://learn.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#Analyzers
stop	string	將文字分割為非字母;套用小寫和停用字詞標記篩選。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopAnalyzer.html
sv.lucene	string	瑞典文的 Lucene 分析器。
sv.microsoft	string	Microsoft瑞典文的分析器。
ta.microsoft	string	泰米爾語Microsoft分析器。
te.microsoft	string	Microsoft Telugu 的分析器。
th.lucene	string	泰文的 Lucene 分析器。
th.microsoft	string	適用於泰文Microsoft分析器。
tr.lucene	string	土耳其文的 Lucene 分析器。
tr.microsoft	string	Microsoft土耳其文的分析器。
uk.microsoft	string	烏克蘭文Microsoft分析器。
ur.microsoft	string	烏爾都語的 Microsoft 分析器。
vi.microsoft	string	Microsoft越南語的分析器。
whitespace	string	使用空格符 Tokenizer 的分析器。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html
zh-Hans.lucene	string	適用於中文的 Lucene 分析器（簡體中文）。
zh-Hans.microsoft	string	Microsoft中文分析器（簡體中文）。
zh-Hant.lucene	string	中國（繁體中文）的 Lucene 分析器。
zh-Hant.microsoft	string	Microsoft中文（繁體中文）分析器。

LexicalTokenizerName

定義搜尋引擎支援的所有 Tokenizer 名稱。

名稱	類型	Description
classic	string	適用於處理大部分歐洲語言檔的文法型Tokenizer。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html
edgeNGram	string	將邊緣的輸入標記化為指定大小的 n-gram。請參閱 https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html
keyword_v2	string	以單一令牌的形式發出整個輸入。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html
letter	string	將文字除以非字母。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
lowercase	string	將文字分割成非字母，並將其轉換成小寫。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html
microsoft_language_stemming_tokenizer	string	使用語言特定規則來分割文字，並將單字縮減為基底形式。
microsoft_language_tokenizer	string	使用語言特定規則來分割文字。
nGram	string	將輸入標記化為指定大小的 n-gram。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html
path_hierarchy_v2	string	類似路徑階層的Tokenizer。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html
pattern	string	使用 regex 模式比對來建構不同令牌的 Tokenizer。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html
standard_v2	string	標準 Lucene 分析器;由標準 Tokenizer、小寫篩選和停止篩選所組成。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html
uax_url_email	string	將 URL 和電子郵件令牌化為一個令牌。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html
whitespace	string	在空格符處分割文字。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html

TokenFilterName

定義搜尋引擎所支援之所有令牌篩選的名稱。

名稱	類型	Description
apostrophe	string	在單引號後面去除所有字元（包括單引號本身）。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html
arabic_normalization	string	套用阿拉伯文正規化程式以正規化正寫的標記篩選。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html
asciifolding	string	如果這類對等專案存在，請將前127個ASCII字元中的字母、數位和符號 Unicode 字元轉換成其 ASCII 對等專案。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
cjk_bigram	string	形成從標準Tokenizer產生的CJK詞彙 bigram。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html
cjk_width	string	標準化 CJK 寬度差異。將全角 ASCII 變體折疊成對等的基本拉丁文，並將半角片假名變體折疊成對等的假名。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html
classic	string	拿掉英文擁有者，以及縮略字中的點。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html
common_grams	string	針對索引編製時經常發生的字詞建構 bigrams。單一字詞仍然編製索引，並覆蓋了 bigrams。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html
edgeNGram_v2	string	從輸入令牌的正面或背面開始，產生指定大小的 n-gram。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html
elision	string	拿掉 elisions。例如，“l'avion” （平面）會轉換成 “avion” （plane）。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
german_normalization	string	根據德國2雪球演算法的啟發學習法，將德文字符正規化。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
hindi_normalization	string	將印度文中的文字正規化，以移除拼字變化的一些差異。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html
indic_normalization	string	以印度語言標準化文字的 Unicode 表示法。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html
keyword_repeat	string	發出每個傳入令牌兩次，一次作為關鍵詞，一次作為非關鍵詞。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html
kstem	string	適用於英文的高效能 kstem 篩選條件。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html
length	string	拿掉太長或太短的字組。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html
limit	string	在編製索引時限制令牌數目。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html
lowercase	string	將標記文字正規化為小寫。請參閱 https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html
nGram_v2	string	產生指定大小的 n-gram。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
persian_normalization	string	適用於波斯文的正規化。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html
phonetic	string	建立注音相符專案的令牌。請參閱 https://lucene.apache.org/core/4_10_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html
porter_stem	string	使用 Porter 字幹分析演算法來轉換令牌數據流。請參閱 http://tartarus.org/~martin/PorterStemmer
reverse	string	反轉令牌字串。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
scandinavian_folding	string	折迭斯堪的納維亞字元 åÅäääÄÄÄ->a 和 öÖøØ->o. 它還歧視使用雙音音 aa， ae， ao， oe 和 oo，只留下第一個。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html
scandinavian_normalization	string	標準化使用可互換的斯堪的納維亞字元。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html
shingle	string	建立令牌的組合做為單一令牌。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html
snowball	string	使用 Snowball 產生的字幹分析器來幹詞的篩選。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html
sorani_normalization	string	標準化 Sorani 文字的 Unicode 表示法。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html
stemmer	string	語言特定字幹分析篩選器。請參閱 https://learn.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#TokenFilters
stopwords	string	從令牌數據流移除停用字詞。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html
trim	string	修剪標記的前置和尾端空格符。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html
truncate	string	將字詞截斷為特定長度。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html
unique	string	篩選出與上一個標記相同的文字標記。請參閱 http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html
uppercase	string	將標記文字正規化為大寫。請參閱 https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html
word_delimiter	string	將單字分割成子字詞，並在子字詞群組上執行選擇性轉換。

共用方式為

Indexes - Analyze

URI 參數

要求標頭

要求本文

回應

範例

SearchServiceIndexAnalyze

範例要求

範例回覆

定義

AnalyzedTokenInfo

AnalyzeRequest

AnalyzeResult

CharFilterName

ErrorAdditionalInfo

ErrorDetail

ErrorResponse

LexicalAnalyzerName

LexicalTokenizerName

TokenFilterName

其他資源