Configure semantic ranking and return captions in search results
Important
Semantic search is in public preview under supplemental terms of use. It's available through Azure portal, preview REST APIs, and beta SDKs. This feature is billable. See Availability and pricing.
In this article, you'll learn how to invoke a semantic ranking algorithm over a result set, promoting the most semantically relevant results to the top of the stack. You can also get semantic captions, with highlights over the most relevant terms and phrases, and semantic answers.
There are two main activities to perform:
- Add a semantic configuration to an index
- Add parameters to a query request
Prerequisites
A search service on Standard tier (S1, S2, S3) or Storage Optimized tier (L1, L2), in these regions: Australia East, East US, East US 2, North Central US, South Central US, West US, West US 2, North Europe, UK South, West Europe.
If you have an existing S1 or greater service in one of these regions, you can enable semantic search without having to create a new service.
Semantic search enabled on your search service.
An existing search index with rich content in a supported query language. Semantic search works best on content that is informational or descriptive.
Review the Semantic search overview if you need an introduction to the feature.
Note
Captions and answers are extracted verbatim from text in the search document. The semantic subsystem determines what part of your content has the characteristics of a caption or answer, but it doesn't compose new sentences or phrases. For this reason, content that includes explanations or definitions work best for semantic search.
1 - Choose a client
You'll need a search client that supports preview APIs on the query request. Here are some options:
Search explorer in Azure portal, recommended for initial exploration.
Postman app using the 2021-04-30-Preview REST APIs. See this Quickstart for help with setting up your requests.
Azure.Search.Documents 11.4.0-beta.5 in the Azure SDK for .NET Preview.
Azure.Search.Documents 11.3.0b6 in the Azure SDK for Python.
2 - Create a semantic configuration
Important
A semantic configuration is required for the 2021-04-30-Preview REST APIs, Search explorer, and some versions of the beta SDKs. If you're using the 2020-06-30-preview REST API, skip this step and use the "searchFields" approach for field prioritization instead.
A semantic configuration specifies how fields are used in semantic ranking. It gives the underlying models hints about which index fields are most important for semantic ranking, captions, highlights, and answers.
You'll add a semantic configuration to your index definition. The tabbed sections below provide instructions for the REST APIs, Azure portal, and the .NET SDK Preview.
You can add or update a semantic configuration at any time without rebuilding your index. When you issue a query, you'll add the semantic configuration (one per query) that specifies which semantic configuration to use for the query.
Review the properties you'll need to specify. A semantic configuration has a name and at least one each of the following properties:
- Title field - A title field should be a concise description of the document, ideally a string that is under 25 words. This field could be the title of the document, name of the product, or item in your search index. If you don't have a title in your search index, leave this field blank.
- Content fields - Content fields should contain text in natural language form. Common examples of content are the body of a document, the description of a product, or other free-form text.
- Keyword fields - Keyword fields should be a list of keywords, such as the tags on a document, or a descriptive term, such as the category of an item.
You can only specify one title field but you can specify as many content and keyword fields as you like. For content and keyword fields, list the fields in priority order because lower priority fields may get truncated.
For the above properties, determine which fields to assign.
A field must be a supported data type and it should contain strings. If you happen to include an invalid field, there's no error, but those fields won't be used in semantic ranking.
Data type Example from hotels-sample-index Edm.String HotelName, Category, Description Edm.ComplexType Address.StreetNumber, Address.City, Address.StateProvince, Address.PostalCode Collection(Edm.String) Tags (a comma-delimited list of strings) Note
Subfields of Collection(Edm.ComplexType) fields aren't currently supported by semantic search and won't be used for semantic ranking, captions, or answers.
Sign in to Azure portal and navigate to a search service that has semantic search enabled.
Open an index.
Select Semantic Configurations and then select Add Semantic Configuration.
The New Semantic Configuration page opens with options for selecting a title field, content fields, and keyword fields. Make sure to list content fields and keyword fields in priority order.
Select OK to save the changes.
Tip
To see an example of creating a semantic configuration and using it to issue a semantic query, check out the semantic search Postman sample.
2b - Use searchFields for field prioritization
This step is only for solutions using the 2020-06-30-Preview REST API or a beta SDK that doesn't support semantic configurations. Instead of setting field prioritization in the index through a semantic configuration, you'll set the priority at query time, using the "searchFields" parameter of a query.
Using "searchFields" for field prioritization was an early implementation detail that won't be supported once semantic search exits public preview. We encourage you to use semantic configurations if your application requirements allow it.
POST https://[service name].search.windows.net/indexes/[index name]/docs/search?api-version=2020-06-30-Preview
{
"search": " Where was Alan Turing born?",
"queryType": "semantic",
"searchFields": "title,url,body",
"queryLanguage": "en-us"
}
Field order is critical because the semantic ranker limits the amount of content it can process while still delivering a reasonable response time. Content from fields at the start of the list are more likely to be included; content from the end could be truncated if the maximum limit is reached. For more information, see Pre-processing during semantic ranking.
If you're specifying just one field, choose a descriptive field where the answer to semantic queries might be found, such as the main content of a document.
For two or more fields in searchFields:
The first field should always be concise (such as a title or name), ideally a string that is under 25 words.
If the index has a URL field that is human readable such as
www.domain.com/name-of-the-document-and-other-details
(rather than machine focused, such aswww.domain.com/?id=23463¶m=eis
), place it second in the list (or first if there's no concise title field).Follow the above fields with other descriptive fields, where the answer to semantic queries may be found, such as the main content of a document.
When setting "searchFields", choose only fields of the following supported data types:
Data type | Example from hotels-sample-index |
---|---|
Edm.String | HotelName, Category, Description |
Edm.ComplexType | Address.StreetNumber, Address.City, Address.StateProvince, Address.PostalCode |
Collection(Edm.String) | Tags (a comma-delimited list of strings) |
If you happen to include an invalid field, there's no error, but those fields won't be used in semantic ranking.
3 - Avoid features that bypass relevance scoring
Several query capabilities in Cognitive Search don't undergo relevance scoring, and some bypass the full text search engine altogether. If your query logic includes the following features, you won't get relevance scores or semantic ranking on your results:
Filters, fuzzy search queries, and regular expressions iterate over untokenized text, scanning for verbatim matches in the content. Search scores for all of the above query forms are a uniform 1.0, and won't provide meaningful input for semantic ranking.
Sorting (orderBy clauses) on specific fields will also override search scores and semantic score. Given that semantic score is used to order results, including explicit sort logic will cause an HTTP 400 error to be returned.
4 - Set up the query
Your next step is adding parameters to the query request. To be successful, your query should be full text search (using the "search" parameter to pass in a string), and the index should contain text fields with rich semantic content.
Search explorer has been updated to include options for semantic queries. To configure semantic ranking in the portal, follow the steps below:
Open the Azure portal and navigate to a search service that has semantic search enabled.
Select Search explorer at the top of the overview page.
Choose an index that has content in a supported language.
In Search explorer, set query options that enable semantic queries, semantic configurations, and spell correction. You can also paste the required query parameters into the query string.
5 - Evaluate the response
Only the top 50 matches from the initial results can be semantically ranked. As with all queries, a response is composed of all fields marked as retrievable, or just those fields listed in the select parameter. A response includes the original relevance score, and might also include a count, or batched results, depending on how you formulated the request.
In semantic search, the response has more elements: a new semantically ranked relevance score, an optional caption in plain text and with highlights, and an optional answer. If your results don't include these extra elements, then your query might be misconfigured. As a first step towards troubleshooting the problem, check the semantic configuration to ensure it's specified in both the index definition and query.
In a client app, you can structure the search page to include a caption as the description of the match, rather than the entire contents of a specific field. This approach is useful when individual fields are too dense for the search results page.
The response for the above example query returns the following match as the top pick. Captions are returned because the "captions" property is set, with plain text and highlighted versions. Answers are omitted from the example because one couldn't be determined for this particular query and corpus.
"@odata.count": 35,
"@search.answers": [],
"value": [
{
"@search.score": 1.8810667,
"@search.rerankerScore": 1.1446577133610845,
"@search.captions": [
{
"text": "Oceanside Resort. Luxury. New Luxury Hotel. Be the first to stay. Bay views from every room, location near the pier, rooftop pool, waterfront dining & more.",
"highlights": "<strong>Oceanside Resort.</strong> Luxury. New Luxury Hotel. Be the first to stay.<strong> Bay</strong> views from every room, location near the pier, rooftop pool, waterfront dining & more."
}
],
"HotelName": "Oceanside Resort",
"Description": "New Luxury Hotel. Be the first to stay. Bay views from every room, location near the pier, rooftop pool, waterfront dining & more.",
"Category": "Luxury"
},
Next steps
Recall that semantic ranking and responses are built over an initial result set. Any logic that improves the quality of the initial results will carry forward to semantic search. As a next step, review the features that contribute to initial results, including analyzers that affect how strings are tokenized, scoring profiles that can tune results, and the default relevance algorithm.
Feedback
Submit and view feedback for