Add spell check to queries in Azure AI Search
Important
Spell correction is in public preview under supplemental terms of use. It's available through the Azure portal, preview REST APIs, and beta versions of Azure SDK libraries.
You can improve recall by spell-correcting words in a query before they reach the search engine. The speller
parameter is supported for all text (non-vector) query types.
Prerequisites
A search service at the Basic tier or higher, in any region.
An existing search index with content in a supported language.
A query request that has
speller=lexicon
andqueryLanguage
set to a supported language. Spell check works on strings passed in thesearch
parameter. It's not supported for filters, fuzzy search, wildcard search, regular expressions, or vector queries.
Use a search client that supports preview APIs on the query request. You can use a REST client or beta releases of the Azure SDKs.
Client library | Versions |
---|---|
REST API | Versions 2020-06-30-Preview and later. We recommend the latest preview API. 2024-05-01-preview |
Azure SDK for .NET | version 11.5.0-beta.5 |
Azure SDK for Java | version 11.6.0-beta.5 |
Azure SDK for JavaScript | version 11.3.0-beta.8 |
Azure SDK for Python | version 11.4.0b3 |
Spell correction with simple search
The following example uses the built-in hotels-sample index to demonstrate spell correction on a simple text query. Without spell correction, the query returns zero results. With correction, the query returns one result for Johnson's family-oriented resort.
POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2024-05-01-preview
{
"search": "famly acitvites",
"speller": "lexicon",
"queryLanguage": "en-us",
"queryType": "simple",
"select": "HotelId,HotelName,Description,Category,Tags",
"count": true
}
Spell correction with full Lucene
Spelling correction occurs on individual query terms that undergo text analysis, which is why you can use the speller parameter with some Lucene queries, but not others.
- Incompatible query forms that bypass text analysis include: wildcard, regex, fuzzy
- Compatible query forms include: fielded search, proximity, term boosting
This example uses fielded search over the Category field, with full Lucene syntax, and a misspelled query term. By including speller, the typo in "Suiite" is corrected and the query succeeds.
POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2024-05-01-preview
{
"search": "Category:(Resort and Spa) OR Category:Suiite",
"queryType": "full",
"speller": "lexicon",
"queryLanguage": "en-us",
"select": "Category",
"count": true
}
Spell correction with semantic ranking
This query, with typos in every term except one, undergoes spelling corrections to return relevant results. To learn more, see Configure semantic ranking.
POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2024-05-01-preview
{
"search": "hisotoric hotell wiht great restrant nad wiifi",
"queryType": "semantic",
"speller": "lexicon",
"queryLanguage": "en-us",
"searchFields": "HotelName,Tags,Description",
"select": "HotelId,HotelName,Description,Category,Tags",
"count": true
}
Supported languages
Valid values for queryLanguage
can be found in the following table, copied from the list of supported languages (REST API reference).
Language | queryLanguage |
---|---|
English [EN] | EN, EN-US (default) |
Spanish [ES] | ES, ES-ES (default) |
French [FR] | FR, FR-FR (default) |
German [DE] | DE, DE-DE (default) |
Dutch [NL] | NL, NL-BE, NL-NL (default) |
Note
Previously, while semantic ranking was in public preview, the queryLanguage
parameter was also used for semantic ranking. Semantic ranking is now language-agnostic.
Language analyzer considerations
Indexes that contain non-English content often use language analyzers on non-English fields to apply the linguistic rules of the native language.
When adding spell check to content that also undergoes language analysis, you can achieve better results using the same language for each indexing and query processing step. For example, if a field's content was indexed using the "fr.microsoft" language analyzer, then queries and spell check should all use a French lexicon or language library of some form.
To recap how language libraries are used in Azure AI Search:
Language analyzers can be invoked during indexing and query execution, and are either Apache Lucene (for example, "de.lucene") or Microsoft ("de.microsoft).
Language lexicons invoked during spell check are specified using one of the language codes in the supported language table.
In a query request, the value assigned to queryLanguage
applies to speller
.
Note
Language consistency across various property values is only a concern if you are using language analyzers. If you are using language-agnostic analyzers (such as keyword, simple, standard, stop, whitespace, or standardasciifolding.lucene
), then the queryLanguage
value can be whatever you want.
While content in a search index can be composed in multiple languages, the query input is most likely in one. The search engine doesn't check for compatibility of queryLanguage
, language analyzer, and the language in which content is composed, so be sure to scope queries accordingly to avoid producing incorrect results.