Querying in Azure Cognitive Search
Azure Cognitive Search offers a rich query language to support a broad range of scenarios, from free text search, to highly-specified query patterns. This article describes query requests and the kinds of queries you can create.
In Cognitive Search, a query is a full specification of a round-trip search
operation, with parameters that both inform query execution and shape the response coming back. To illustrate, the following query example calls the Search Documents (REST API). It's a parameterized, free text query with a boolean operator, targeting the hotels-sample-index documents collection. It also selects which fields are returned in results.
POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2020-06-30
{
"queryType": "simple",
"searchMode": "all",
"search": "restaurant +view",
"searchFields": "HotelName, Description, Address/City, Address/StateProvince, Tags",
"select": "HotelName, Description, Address/City, Address/StateProvince, Tags",
"top": "10",
"count": "true",
"orderby": "Rating desc"
}
Parameters used during query execution include:
queryType
sets the parser:simple
,full
. The default simple query parser is optimal for full text search. The full Lucene query parser is for advanced query constructs like regular expressions, proximity search, fuzzy and wildcard search. This parameter can also be set tosemantic
for semantic search for advanced semantic modeling on the query response.searchMode
specifies whether matches are based on "all" criteria (favors precision) or "any" criteria (favors recall) in the expression. The default is "any".search
provides the match criteria, usually whole terms or phrases, with or without operators. Any field that is attributed as "searchable" in the index schema is a candidate for this parameter.searchFields
constrains query execution to specific searchable fields. During development, it's helpful to use the same field list for select and search. Otherwise a match might be based on field values that you can't see in the results, creating uncertainty as to why the document was returned.
Parameters used to shape the response:
select
specifies which fields to return in the response. Only fields marked as "retrievable" in the index can be used in a select statement.top
returns the specified number of best-matching documents. In this example, only 10 hits are returned. You can use top and skip (not shown) to page the results.count
tells you how many documents in the entire index match overall, which can be more than what are returned.orderby
is used if you want to sort results by a value, such as a rating or location. Otherwise, the default is to use the relevance score to rank results. A field must be attributed as "sortable" to be a candidate for this parameter.
The above list is representative but not exhaustive. For the full list of parameters on a query request, see Search Documents (REST API).
Types of queries
With a few notable exceptions, a query request iterates over inverted indexes that are structured for fast scans, where a match can be found in potentially any field, within any number of search documents. In Cognitive Search, the primary methodology for finding matches is either full text search or filters, but you can also implement other well-known search experiences like autocomplete, or geo-location search. The rest of this article summarizes queries in Cognitive Search and provides links to more information and examples.
Full text search
If your search app includes a search box that collects term inputs, then full text search is probably the query operation backing that experience. Full text search accepts terms or phrases passed in a search
parameter in all "searchable" fields in your index. Optional boolean operators in the query string can specify inclusion or exclusion criteria. Both the simple parser and full parser support full text search.
In Cognitive Search, full text search is built on the Apache Lucene query engine. Query strings in full text search undergo lexical analysis to make scans more efficient. Analysis includes lower-casing all terms, removing stop words like "the", and reducing terms to primitive root forms. The default analyzer is Standard Lucene.
When matching terms are found, the query engine reconstitutes a search document containing the match using the document key or ID to assemble field values, ranks the documents in order of relevance, and returns the top 50 (by default) in the response or a different number if you specified top
.
If you're implementing full text search, understanding how your content is tokenized will help you debug any query anomalies. Queries over hyphenated strings or special characters could necessitate using an analyzer other than the default standard Lucene to ensure the index contains the right tokens. You can override the default with language analyzers or specialized analyzers that modify lexical analysis. One example is keyword that treats the entire contents of a field as a single token. This is useful for data like zip codes, IDs, and some product names. For more information, see Partial term search and patterns with special characters.
Tip
If you anticipate heavy use of Boolean operators, which is more likely in indexes that contain large text blocks (a content field or long descriptions), be sure to test queries with the searchMode=Any|All
parameter to evaluate the impact of that setting on boolean search.
Autocomplete and suggested queries
Autocomplete or suggested results are alternatives to search
that fire successive query requests based on partial string inputs (after each character) in a search-as-you-type experience. You can use autocomplete
and suggestions
parameter together or separately, as described in this tutorial, but you cannot use them with search
. Both completed terms and suggested queries are derived from index contents. The engine will never return a string or suggestion that is non-existent in your index. For more information, see Autocomplete (REST API) and Suggestions (REST API).
Filter search
Filters are widely used in apps that are based on Cognitive Search. On application pages, filters are often visualized as facets in link navigation structures for user-directed filtering. Filters are also used internally to expose slices of indexed content. For example, you might initialize a search page using a filter on a product category, or a language if an index contains fields in both English and French.
You might also need filters to invoke a specialized query form, as described in the following table. You can use a filter with an unspecified search (search=*
) or with a query string that includes terms, phrases, operators, and patterns.
Filter scenario | Description |
---|---|
Range filters | In Azure Cognitive Search, range queries are built using the filter parameter. For more information and examples, see Range filter example. |
Faceted navigation | In faceted navigation tree, users can select facets. When backed by filters, search results narrow on each click. Each facet is backed by a filter that excludes documents that no longer match the criteria provided by the facet. |
Note
Text that's used in a filter expression is not analyzed during query processing. The text input is presumed to be a verbatim case-sensitive character pattern that either succeeds or fails on the match. Filter expressions are constructed using OData syntax and passed in a filter
parameter in all filterable fields in your index. For more information, see Filters in Azure Cognitive Search.
Geospatial search
Geospatial search matches on a location's latitude and longitude coordinates for "find near me" or map-based search experience. In Azure Cognitive Search, you can implement geospatial search by following these steps:
- Define a filterable field of one of these types: Edm.GeographyPoint, Collection(Edm.GeographyPoint, Edm.GeographyPolygon).
- Verify the incoming documents include the appropriate coordinates.
- After indexing is complete, build a query that uses a filter and a geo-spatial function.
For more information and an example, see Geospatial search example.
Document look-up
In contrast with the previously described query forms, this one retrieves a single search document by ID, with no corresponding index search or scan. Only the one document is requested and returned. When a user selects an item in search results, retrieving the document and populating a details page with fields is a typical response, and a document look-up is the operation that supports it.
Advanced search: fuzzy, wildcard, proximity, regex
An advanced query form depends on the Full Lucene parser and operators that trigger a specific query behavior.
Query type | Usage | Examples and more information |
---|---|---|
Fielded search | search parameter, queryType=full |
Build a composite query expression targeting a single field. Fielded search example |
fuzzy search | search parameter, queryType=full |
Matches on terms having a similar construction or spelling. Fuzzy search example |
proximity search | search parameter, queryType=full |
Finds terms that are near each other in a document. Proximity search example |
term boosting | search parameter, queryType=full |
Ranks a document higher if it contains the boosted term, relative to others that don't. Term boosting example |
regular expression search | search parameter, queryType=full |
Matches based on the contents of a regular expression. Regular expression example |
wildcard or prefix search | search parameter with *~ or ? , queryType=full |
Matches based on a prefix and tilde (~ ) or single character (? ). Wildcard search example |
Next steps
For a closer look at query implementation, review the examples for each syntax. If you are new to full text search, a closer look at what the query engine does might be an equally good choice.
Feedback
Submit and view feedback for