通过使用 Azure.Search.Documents 客户端库生成控制台应用程序,从而将语义排名添加到现有搜索索引。
或者,可以下载源代码以从完成的项目开始。
设置你的环境
启动 Visual Studio 并为控制台应用创建新项目。
在“工具”>“NuGet 包管理器”中,选择“管理解决方案的 NuGet 包...”。
选择“浏览”。
搜索 Azure.Search.Documents 包,然后选择最新的稳定版本。
选择安装”,将该程序集添加到你的项目和解决方案。
创建搜索客户端
在Program.cs中,添加以下using
个指令。
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using Azure.Search.Documents.Models;
创建两个客户端:SearchIndexClient 创建索引,SearchClient 加载并查询现有索引。
两个客户端都需要服务终结点和管理员 API 密钥才能使用创建/删除权限进行身份验证。 但是,代码会为你生成 URI,因此请仅指定 serviceName
属性的搜索服务名称。 不要包括 https://
或 .search.windows.net
。
static void Main(string[] args)
{
string serviceName = "<YOUR-SEARCH-SERVICE-NAME>";
string apiKey = "<YOUR-SEARCH-ADMIN-API-KEY>";
string indexName = "hotels-quickstart";
// Create a SearchIndexClient to send create/delete index commands
Uri serviceEndpoint = new Uri($"https://{serviceName}.search.windows.net/");
AzureKeyCredential credential = new AzureKeyCredential(apiKey);
SearchIndexClient adminClient = new SearchIndexClient(serviceEndpoint, credential);
// Create a SearchClient to load and query documents
SearchClient srchclient = new SearchClient(serviceEndpoint, indexName, credential);
. . .
}
创建索引
创建或更新索引架构以包含 SemanticConfiguration
。 如果要更新现有索引,此修改无需重新编制索引,因为文档的结构保持不变。
// Create hotels-quickstart index
private static void CreateIndex(string indexName, SearchIndexClient adminClient)
{
FieldBuilder fieldBuilder = new FieldBuilder();
var searchFields = fieldBuilder.Build(typeof(Hotel));
var definition = new SearchIndex(indexName, searchFields);
var suggester = new SearchSuggester("sg", new[] { "HotelName", "Category", "Address/City", "Address/StateProvince" });
definition.Suggesters.Add(suggester);
definition.SemanticSearch = new SemanticSearch
{
Configurations =
{
new SemanticConfiguration("semantic-config", new()
{
TitleField = new SemanticField("HotelName"),
ContentFields =
{
new SemanticField("Description"),
},
KeywordsFields =
{
new SemanticField("Tags"),
new SemanticField("Category")
}
})
}
};
adminClient.CreateOrUpdateIndex(definition);
}
以下代码在搜索服务上创建索引:
// Create index
Console.WriteLine("{0}", "Creating index...\n");
CreateIndex(indexName, adminClient);
SearchClient ingesterClient = adminClient.GetSearchClient(indexName);
加载文档
Azure AI 搜索对存储在服务中的内容进行搜索。 用于上传文档的代码与全文搜索的 C# 快速入门 相同,因此我们无需在此处复制它。 你应该有四家酒店,其中包含名称、地址和说明。 解决方案应具有“酒店”和“地址”类型。
搜索索引
下面是调用语义排序器的查询,其中包含用于指定参数的搜索选项:
Console.WriteLine("Example of a semantic query.");
options = new SearchOptions()
{
QueryType = Azure.Search.Documents.Models.SearchQueryType.Semantic,
SemanticSearch = new()
{
SemanticConfigurationName = "semantic-config",
QueryCaption = new(QueryCaptionType.Extractive)
}
};
options.Select.Add("HotelName");
options.Select.Add("Category");
options.Select.Add("Description");
// response = srchclient.Search<Hotel>("*", options);
response = srchclient.Search<Hotel>("restaurant on site", options);
WriteDocuments(response);
为了进行比较,下面是基于字词频率和邻近度使用默认 BM25 排名的查询的结果。 鉴于查询“现场餐厅”,BM25 排名算法按照此屏幕截图中显示的顺序返回匹配项,其中对“现场”的匹配被视为更相关,因为它在数据集中很少见。
相比之下,当语义排名应用于同一查询时(“站点上的餐厅”),结果将基于与查询的语义相关性重新计算。 这一次,最优结果是具有相应餐厅的酒店,这更符合用户期望。
运行程序
按 F5 可重新生成应用并完整运行该程序。
输出包含 Console.WriteLine 中的消息,并添加了查询信息和结果。
使用 Jupyter 笔记本和 Azure SDK for Python 中的azure-search-documents库以了解语义排名。
或者,可以下载并运行一个已完成的笔记本。
设置你的环境
使用带有 Python 扩展的 Visual Studio Code(或等效的 IDE),Python 版本为 3.10 或更高。
建议针对本快速入门使用虚拟环境:
启动 Visual Studio Code。
创建新的 .ipynb 文件。
通过使用 Ctrl+Shift+P 打开命令面板。
搜索“Python: 创建环境”。
选择 Venv.
选择 Python 解释器。 选择 3.10 或更高版本。
设置可能需要 1 分钟。 如果遇到问题,请参阅 VS Code 中的 Python 环境。
安装包并设置变量
安装包,包括 azure-search-documents。
! pip install azure-search-documents==11.6.0b1 --quiet
! pip install azure-identity --quiet
! pip install python-dotenv --quiet
提供终结点和 API 密钥:
search_endpoint: str = "PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE"
search_api_key: str = "PUT-YOUR-SEARCH-SERVICE-ADMIN-API-KEY-HERE"
index_name: str = "hotels-quickstart"
创建索引
创建或更新索引架构以包含 SemanticConfiguration
。 如果要更新现有索引,此修改无需重新编制索引,因为文档的结构保持不变。
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents import SearchClient
from azure.search.documents.indexes.models import (
ComplexField,
SimpleField,
SearchFieldDataType,
SearchableField,
SearchIndex,
SemanticConfiguration,
SemanticField,
SemanticPrioritizedFields,
SemanticSearch
)
# Create a search schema
index_client = SearchIndexClient(
endpoint=search_endpoint, credential=credential)
fields = [
SimpleField(name="HotelId", type=SearchFieldDataType.String, key=True),
SearchableField(name="HotelName", type=SearchFieldDataType.String, sortable=True),
SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
SearchableField(name="Category", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="Tags", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),
SimpleField(name="ParkingIncluded", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),
SimpleField(name="LastRenovationDate", type=SearchFieldDataType.DateTimeOffset, facetable=True, filterable=True, sortable=True),
SimpleField(name="Rating", type=SearchFieldDataType.Double, facetable=True, filterable=True, sortable=True),
ComplexField(name="Address", fields=[
SearchableField(name="StreetAddress", type=SearchFieldDataType.String),
SearchableField(name="City", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="StateProvince", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="PostalCode", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="Country", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
])
]
semantic_config = SemanticConfiguration(
name="semantic-config",
prioritized_fields=SemanticPrioritizedFields(
title_field=SemanticField(field_name="HotelName"),
keywords_fields=[SemanticField(field_name="Category")],
content_fields=[SemanticField(field_name="Description")]
)
)
# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])
scoring_profiles = []
suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]
# Create the search index with the semantic settings
index = SearchIndex(name=index_name, fields=fields, suggesters=suggester, scoring_profiles=scoring_profiles, semantic_search=semantic_search)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')
创建文档有效负载
可以将 JSON 文档推送到搜索索引。 文档必须与索引架构匹配。
documents = [
{
"@search.action": "upload",
"HotelId": "1",
"HotelName": "Stay-Kay City Hotel",
"Description": "This classic hotel is fully-refurbished and ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Times Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
"Category": "Boutique",
"Tags": [ "view", "air conditioning", "concierge" ],
"ParkingIncluded": "false",
"LastRenovationDate": "2022-01-18T00:00:00Z",
"Rating": 3.60,
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY",
"PostalCode": "10022",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "2",
"HotelName": "Old Century Hotel",
"Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts. The hotel also regularly hosts events like wine tastings, beer dinners, and live music.",
"Category": "Boutique",
"Tags": [ "pool", "free wifi", "concierge" ],
"ParkingIncluded": "false",
"LastRenovationDate": "2019-02-18T00:00:00Z",
"Rating": 3.60,
"Address": {
"StreetAddress": "140 University Town Center Dr",
"City": "Sarasota",
"StateProvince": "FL",
"PostalCode": "34243",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "3",
"HotelName": "Gastronomic Landscape Hotel",
"Description": "The Gastronomic Hotel stands out for its culinary excellence under the management of William Dough, who advises on and oversees all of the Hotel’s restaurant services.",
"Category": "Suite",
"Tags": [ "restaurant", "bar", "continental breakfast" ],
"ParkingIncluded": "true",
"LastRenovationDate": "2015-09-20T00:00:00Z",
"Rating": 4.80,
"Address": {
"StreetAddress": "3393 Peachtree Rd",
"City": "Atlanta",
"StateProvince": "GA",
"PostalCode": "30326",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "4",
"HotelName": "Sublime Palace Hotel",
"Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Cliff is part of a lovingly restored 19th century resort, updated for every modern convenience.",
"Category": "Boutique",
"Tags": [ "concierge", "view", "air conditioning" ],
"ParkingIncluded": "true",
"LastRenovationDate": "2020-02-06T00:00:00Z",
"Rating": 4.60,
"Address": {
"StreetAddress": "7400 San Pedro Ave",
"City": "San Antonio",
"StateProvince": "TX",
"PostalCode": "78216",
"Country": "USA"
}
}
]
将文档上传到索引
search_client = SearchClient(endpoint=search_endpoint,
index_name=index_name,
credential=credential)
try:
result = search_client.upload_documents(documents=documents)
print("Upload of new document succeeded: {}".format(result[0].succeeded))
except Exception as ex:
print (ex.message)
index_client = SearchIndexClient(
endpoint=search_endpoint, credential=credential)
运行自己的第一个查询
从空查询开始(作为验证步骤),证明索引可操作。 应获得酒店名称和说明的无序列表,计数为 4,表示索引中有四个文档。
# Run an empty query (returns selected fields, all documents)
results = search_client.search(query_type='simple',
search_text="*" ,
select='HotelName,Description',
include_total_count=True)
print ('Total Documents Matching Query:', results.get_count())
for result in results:
print(result["@search.score"])
print(result["HotelName"])
print(f"Description: {result['Description']}")
运行文本查询
出于比较目的,请使用 BM25 相关性评分运行文本查询。 提供查询字符串时,会调用全文搜索。 响应包括排名结果,其中较高的分数会授予具有更多匹配字词实例或更重要字词的文档。
在“自营餐厅”这条查询中,“Sublime Palace 酒店”因其描述中包含“自营”而排名第一。 不经常出现的字词会提高文档的搜索分数。
# Run a text query (returns a BM25-scored result set)
results = search_client.search(query_type='simple',
search_text="restaurant on site" ,
select='HotelName,HotelId,Description',
include_total_count=True)
for result in results:
print(result["@search.score"])
print(result["HotelName"])
print(f"Description: {result['Description']}")
运行语义查询
现在添加语义排名。 新参数包括 query_type
和 semantic_configuration_name
。
这是同一个查询,但请注意,语义排序器将 Gastronomic Landscape Hotel 正确识别为与给定的初始查询关系更密切的酒店。 此查询还会返回模型生成的标题。 此示例中的输入太少,无法创建有趣标题,但该示例成功演示了语法。
# Runs a semantic query (runs a BM25-ranked query and promotes the most relevant matches to the top)
results = search_client.search(query_type='semantic', semantic_configuration_name='semantic-config',
search_text="restaurant on site",
select='HotelName,Description,Category', query_caption='extractive')
for result in results:
print(result["@search.reranker_score"])
print(result["HotelName"])
print(f"Description: {result['Description']}")
captions = result["@search.captions"]
if captions:
caption = captions[0]
if caption.highlights:
print(f"Caption: {caption.highlights}\n")
else:
print(f"Caption: {caption.text}\n")
返回语义答案
在此最终查询中,会返回语义答案。
语义排序器可以生成具有问题特征的查询字符串的答案。 生成的答案从内容中逐字提取。 要获取语义答案,问题和答案必须密切一致,并且模型必须找到明确回答问题的内容。 如果可能的答案无法满足置信度阈值,模型不会返回答案。 出于演示目的,此示例中的问题旨在获取响应,以便你可以看到语法。
# Run a semantic query that returns semantic answers
results = search_client.search(query_type='semantic', semantic_configuration_name='semantic-config',
search_text="what hotel is in a historic building",
select='HotelName,Description,Category', query_caption='extractive', query_answer="extractive",)
semantic_answers = results.get_answers()
for answer in semantic_answers:
if answer.highlights:
print(f"Semantic Answer: {answer.highlights}")
else:
print(f"Semantic Answer: {answer.text}")
print(f"Semantic Answer Score: {answer.score}\n")
for result in results:
print(result["@search.reranker_score"])
print(result["HotelName"])
print(f"Description: {result['Description']}")
captions = result["@search.captions"]
if captions:
caption = captions[0]
if caption.highlights:
print(f"Caption: {caption.highlights}\n")
else:
print(f"Caption: {caption.text}\n")