Configure semantic ranker and return captions in search results
Semantic ranking iterates over an initial result set, applying an L2 ranking methodology that promotes the most semantically relevant results to the top of the stack. You can also get semantic captions, with highlights over the most relevant terms and phrases, and semantic answers.
This article explains how to configure a search index for semantic reranking.
Prerequisites
A search service on a Basic tier or higher, subject to region availability.
Semantic ranker enabled on your search service.
An existing search index with rich text content. Semantic ranking applies to strings (nonvector) fields and works best on content that is informational or descriptive.
Choose a client
You can use any of the following tools and SDKs to add a semantic configuration:
- Azure portal, using the index designer to add a semantic configuration.
- Visual Studio Code with the REST client
- Azure SDK for .NET
- Azure SDK for Python
- Azure SDK for Java
- Azure SDK for JavaScript
Add a semantic configuration
A semantic configuration is a section in your index that establishes field inputs for semantic ranking. You can add or update a semantic configuration at any time, no rebuild necessary. If you create multiple configurations, you can specify a default. At query time, specify a semantic configuration on a query request, or leave it blank to use the default.
A semantic configuration has a name and the following properties:
Property | Characteristics |
---|---|
Title field | A short string, ideally under 25 words. This field could be the title of a document, name of a product, or a unique identifier. If you don't have suitable field, leave it blank. |
Content fields | Longer chunks of text in natural language form, subject to maximum token input limits on the machine learning models. Common examples include the body of a document, description of a product, or other free-form text. |
Keyword fields | A list of keywords, such as the tags on a document, or a descriptive term, such as the category of an item. |
You can only specify one title field, but you can have as many content and keyword fields as you like. For content and keyword fields, list the fields in priority order because lower priority fields might get truncated.
Across all semantic configuration properties, the fields you assign must be:
- Attributed as
searchable
andretrievable
- Strings of type
Edm.String
,Collection(Edm.String)
, string subfields ofEdm.ComplexType
Sign in to the Azure portal and navigate to a search service that has semantic ranking enabled.
From Indexes on the left-navigation pane, open an index.
Select Semantic Configurations and then select Add Semantic Configuration.
The New Semantic Configuration page opens with options for selecting a title field, content fields, and keyword fields. Only searchable and retrievable string fields are eligible. Make sure to list content fields and keyword fields in priority order.
Select OK to save the changes.
Migrate from preview versions
If your semantic ranking code is using preview APIs, this section explains how to migrate to stable versions. You can check the change logs for verification of general availability:
- 2024-07-01 (REST)
- Azure SDK for .NET (11.5) change log
- Azure SDK for Python (11.4) change log
- Azure SDK for Java (11.6) change log
- Azure SDK for JavaScript (12.0) change log
Behavior changes:
As of July 14, 2023, semantic ranker is language agnostic. It can rerank results composed of multilingual content, with no bias towards a specific language. In preview versions, semantic ranking would deprioritize results differing from the language specified by the field analyzer.
In 2021-04-30-Preview and all later versions, for the REST API and all SDK packages targeting the same version:
semanticConfiguration
(in an index definition) defines which search fields are used in semantic ranking. Previously in the 2020-06-30-Preview REST API,searchFields
(in a query request) was used for field specification and prioritization. This approach only worked in 2020-06-30-Preview and is obsolete in all other versions.
Step 1: Remove queryLanguage
The semantic ranking engine is now language agnostic. If queryLanguage
is specified in your query logic, it's no longer used for semantic ranking, but still applies to spell correction.
Keep queryLanguage
if you're using speller, and if the language value is supported by speller. Spell check has limited availability across languages.
Otherwise, delete queryLanguage
.
Step 2: Replace searchFields
with semanticConfiguration
If your code calls the 2020-06-30-Preview REST API or beta SDK packages targeting that REST API version, you might be using searchFields
in a query request to specify semantic fields and priorities. In initial beta versions, searchFields
had a dual purpose, constraining the initial query to the fields listed in searchFields
, and also setting field priority if semantic ranking was used. In later versions, searchFields
retains its original purpose, but is no longer used for semantic ranking.
Keep searchFields
in query requests if you're using it to limit full text search to the list of named fields.
Add a semanticConfiguration
to an index schema to specify field prioritization, following the instructions in this article.
Next steps
Test your semantic configuration by running a semantic query.