Querying in Azure Cognitive Search

Azure Cognitive Search offers a rich query language to support a broad range of scenarios, from free text search, to highly-specified query patterns. This article describes query requests and the kinds of queries you can create.

In Cognitive Search, a query is a full specification of a round-trip search operation, with parameters that both inform query execution and shape the response coming back. To illustrate, the following query example calls the Search Documents (REST API). It's a parameterized, free text query with a boolean operator, targeting the hotels-sample-index documents collection. It also selects which fields are returned in results.

POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2020-06-30
{
    "queryType": "simple",
    "searchMode": "all",
    "search": "restaurant +view",
    "searchFields": "HotelName, Description, Address/City, Address/StateProvince, Tags",
    "select": "HotelName, Description, Address/City, Address/StateProvince, Tags",
    "top": "10",
    "count": "true",
    "orderby": "Rating desc"
}

Parameters used during query execution include:

  • queryType sets the parser: simple, full. The default simple query parser is optimal for full text search. The full Lucene query parser is for advanced query constructs like regular expressions, proximity search, fuzzy and wildcard search. This parameter can also be set to semantic for semantic search for advanced semantic modeling on the query response.

  • searchMode specifies whether matches are based on "all" criteria (favors precision) or "any" criteria (favors recall) in the expression. The default is "any".

  • search provides the match criteria, usually whole terms or phrases, with or without operators. Any field that is attributed as "searchable" in the index schema is a candidate for this parameter.

  • searchFields constrains query execution to specific searchable fields. During development, it's helpful to use the same field list for select and search. Otherwise a match might be based on field values that you can't see in the results, creating uncertainty as to why the document was returned.

Parameters used to shape the response:

  • select specifies which fields to return in the response. Only fields marked as "retrievable" in the index can be used in a select statement.

  • top returns the specified number of best-matching documents. In this example, only 10 hits are returned. You can use top and skip (not shown) to page the results.

  • count tells you how many documents in the entire index match overall, which can be more than what are returned.

  • orderby is used if you want to sort results by a value, such as a rating or location. Otherwise, the default is to use the relevance score to rank results. A field must be attributed as "sortable" to be a candidate for this parameter.

The above list is representative but not exhaustive. For the full list of parameters on a query request, see Search Documents (REST API).

Types of queries

With a few notable exceptions, a query request iterates over inverted indexes that are structured for fast scans, where a match can be found in potentially any field, within any number of search documents. In Cognitive Search, the primary methodology for finding matches is either full text search or filters, but you can also implement other well-known search experiences like autocomplete, or geo-location search. The rest of this article summarizes queries in Cognitive Search and provides links to more information and examples.

If your search app includes a search box that collects term inputs, then full text search is probably the query operation backing that experience. Full text search accepts terms or phrases passed in a search parameter in all "searchable" fields in your index. Optional boolean operators in the query string can specify inclusion or exclusion criteria. Both the simple parser and full parser support full text search.

In Cognitive Search, full text search is built on the Apache Lucene query engine. Query strings in full text search undergo lexical analysis to make scans more efficient. Analysis includes lower-casing all terms, removing stop words like "the", and reducing terms to primitive root forms. The default analyzer is Standard Lucene.

When matching terms are found, the query engine reconstitutes a search document containing the match using the document key or ID to assemble field values, ranks the documents in order of relevance, and returns the top 50 (by default) in the response or a different number if you specified top.

If you're implementing full text search, understanding how your content is tokenized will help you debug any query anomalies. Queries over hyphenated strings or special characters could necessitate using an analyzer other than the default standard Lucene to ensure the index contains the right tokens. You can override the default with language analyzers or specialized analyzers that modify lexical analysis. One example is keyword that treats the entire contents of a field as a single token. This is useful for data like zip codes, IDs, and some product names. For more information, see Partial term search and patterns with special characters.

Tip

If you anticipate heavy use of Boolean operators, which is more likely in indexes that contain large text blocks (a content field or long descriptions), be sure to test queries with the searchMode=Any|All parameter to evaluate the impact of that setting on boolean search.

Autocomplete and suggested queries

Autocomplete or suggested results are alternatives to search that fire successive query requests based on partial string inputs (after each character) in a search-as-you-type experience. You can use autocomplete and suggestions parameter together or separately, as described in this tutorial, but you cannot use them with search. Both completed terms and suggested queries are derived from index contents. The engine will never return a string or suggestion that is non-existent in your index. For more information, see Autocomplete (REST API) and Suggestions (REST API).

Filters are widely used in apps that are based on Cognitive Search. On application pages, filters are often visualized as facets in link navigation structures for user-directed filtering. Filters are also used internally to expose slices of indexed content. For example, you might initialize a search page using a filter on a product category, or a language if an index contains fields in both English and French.

You might also need filters to invoke a specialized query form, as described in the following table. You can use a filter with an unspecified search (search=*) or with a query string that includes terms, phrases, operators, and patterns.

Filter scenario Description
Range filters In Azure Cognitive Search, range queries are built using the filter parameter. For more information and examples, see Range filter example.
Faceted navigation In faceted navigation tree, users can select facets. When backed by filters, search results narrow on each click. Each facet is backed by a filter that excludes documents that no longer match the criteria provided by the facet.

Note

Text that's used in a filter expression is not analyzed during query processing. The text input is presumed to be a verbatim case-sensitive character pattern that either succeeds or fails on the match. Filter expressions are constructed using OData syntax and passed in a filter parameter in all filterable fields in your index. For more information, see Filters in Azure Cognitive Search.

Geospatial search matches on a location's latitude and longitude coordinates for "find near me" or map-based search experience. In Azure Cognitive Search, you can implement geospatial search by following these steps:

For more information and an example, see Geospatial search example.

Document look-up

In contrast with the previously described query forms, this one retrieves a single search document by ID, with no corresponding index search or scan. Only the one document is requested and returned. When a user selects an item in search results, retrieving the document and populating a details page with fields is a typical response, and a document look-up is the operation that supports it.

Advanced search: fuzzy, wildcard, proximity, regex

An advanced query form depends on the Full Lucene parser and operators that trigger a specific query behavior.

Query type Usage Examples and more information
Fielded search search parameter, queryType=full Build a composite query expression targeting a single field.
Fielded search example
fuzzy search search parameter, queryType=full Matches on terms having a similar construction or spelling.
Fuzzy search example
proximity search search parameter, queryType=full Finds terms that are near each other in a document.
Proximity search example
term boosting search parameter, queryType=full Ranks a document higher if it contains the boosted term, relative to others that don't.
Term boosting example
regular expression search search parameter, queryType=full Matches based on the contents of a regular expression.
Regular expression example
wildcard or prefix search search parameter with *~ or ?, queryType=full Matches based on a prefix and tilde (~) or single character (?).
Wildcard search example

Next steps

For a closer look at query implementation, review the examples for each syntax. If you are new to full text search, a closer look at what the query engine does might be an equally good choice.