应用内容搜索入门

使用应用内容搜索创建应用内内容的语义索引。 这允许用户根据含义而不是关键字查找信息。 该索引还可用于利用领域特定的知识增强 AI 助手,从而获得更个性化和具有上下文相关性的结果。

具体而言,你将了解如何使用 AppContentIndexer API:

  • 在应用中创建或打开内容索引
  • 向索引添加文本字符串,然后运行查询
  • 管理长文本字符串复杂性
  • 为图像数据编制索引,然后搜索相关图像
  • 启用 RAG(检索增强生成)场景
  • 在后台线程上使用 AppContentIndexer
  • 当不再用于释放资源时关闭 AppContentIndexer

先决条件

若要了解 Windows AI API 硬件要求以及如何将设备配置为使用 Windows AI API 成功生成应用,请参阅 开始使用 Windows AI API 构建应用

包标识要求

使用 AppContentIndexer 的应用必须具有包标识,该标识仅适用于打包的应用(包括具有外部位置的应用)。 若要启用语义索引和 文本识别(OCR),应用还必须声明 systemaimodels 该功能

在应用中创建或打开内容索引

若要在应用中创建内容的语义索引,必须先建立一个可搜索的结构,你的应用可用于有效地存储和检索内容。 此索引充当应用的内容的本地语义和词法搜索引擎。

若要使用 AppContentIndexer API,请首先使用指定的索引名称进行调用 GetOrCreateIndex 。 如果当前应用标识和用户已经存在具有该名称的索引,则会打开该索引;否则,将创建一个新索引。

public void SimpleGetOrCreateIndexSample()
{
    GetOrCreateIndexResult result = AppContentIndexer.GetOrCreateIndex("myindex");
    if (!result.Succeeded)
    {
        throw new InvalidOperationException($"Failed to open index. Status = '{result.Status}', Error = '{result.ExtendedError}'");
    }
    // If result.Succeeded is true, result.Status will either be CreatedNew or OpenedExisting
    if (result.Status == GetOrCreateIndexStatus.CreatedNew)
    {
        Console.WriteLine("Created a new index");
    }
    else if(result.Status == GetOrCreateIndexStatus.OpenedExisting)
    {
        Console.WriteLine("Opened an existing index");
    }
    using AppContentIndexer indexer = result.Indexer;
    // Use indexer...
}

此示例展示了在打开索引失败时如何进行错误处理。 为简单起见,本文档中的其他示例可能不会显示错误处理。

向索引添加文本字符串,然后运行查询

此示例演示如何将一些文本字符串添加到为应用创建的索引,然后针对该索引运行查询以检索相关信息。

    // This is some text data that we want to add to the index:
    Dictionary<string, string> simpleTextData = new Dictionary<string, string>
    {
        {"item1", "Here is some information about Cats: Cats are cute and fluffy. Young cats are very playful." },
        {"item2", "Dogs are loyal and affectionate animals known for their companionship, intelligence, and diverse breeds." },
        {"item3", "Fish are aquatic creatures that breathe through gills and come in a vast variety of shapes, sizes, and colors." },
        {"item4", "Broccoli is a nutritious green vegetable rich in vitamins, fiber, and antioxidants." },
        {"item5", "Computers are powerful electronic devices that process information, perform calculations, and enable communication worldwide." },
        {"item6", "Music is a universal language that expresses emotions, tells stories, and connects people through rhythm and melody." },
    };

    public void SimpleTextIndexingSample()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        // Add some text data to the index:
        foreach (var item in simpleTextData)
        {
            IndexableAppContent textContent = AppManagedIndexableAppContent.CreateFromString(item.Key, item.Value);
            indexer.AddOrUpdate(textContent);
        }
    }

    public void SimpleTextQueryingSample()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        // We search the index using a semantic query:
        AppIndexTextQuery queryCursor = indexer.CreateTextQuery("Facts about kittens.");
        IReadOnlyList<TextQueryMatch> textMatches = queryCursor.GetNextMatches(5);
        // Nothing in the index exactly matches what we queried but item1 is similar to the query so we expect
        // that to be the first match.
        foreach (var match in textMatches)
        {
            Console.WriteLine(match.ContentId);
            if (match.ContentKind == QueryMatchContentKind.AppManagedText)
            {
                AppManagedTextQueryMatch textResult = (AppManagedTextQueryMatch)match;
                // Only part of the original string may match the query. So we can use TextOffset and TextLength to extract the match.
                // In this example, we might imagine that the substring "Cats are cute and fluffy" from "item1" is the top match for the query.
                string matchingData = simpleTextData[match.ContentId];
                string matchingString = matchingData.Substring(textResult.TextOffset, textResult.TextLength);
                Console.WriteLine(matchingString);
            }
        }
    }

QueryMatch 仅包含 ContentIdTextOffset/TextLength,而不包括匹配的文本本身。 作为应用开发人员,你有责任引用原始文本。 查询结果按相关性排序,最高结果最相关。 索引以异步方式进行,因此查询可能会在部分数据上运行。 可以检查索引状态,如下所示。

管理长文本字符串复杂性

该示例演示应用开发人员不必将文本内容划分为较小的部分进行模型处理。 AppContentIndexer 管理复杂性的这一方面。

    Dictionary<string, string> textFiles = new Dictionary<string, string>
    {
        {"file1", "File1.txt" },
        {"file2", "File2.txt" },
        {"file3", "File3.txt" },
    };
    public void TextIndexingSample2()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        var folderPath = Windows.ApplicationModel.Package.Current.InstalledLocation.Path;
        // Add some text data to the index:
        foreach (var item in textFiles)
        {
            string contentId = item.Key;
            string filename = item.Value;
            // Note that the text here can be arbitrarily large. The AppContentIndexer will take care of chunking the text
            // in a way that works effectively with the underlying model. We do not require the app author to break the text
            // down into small pieces.
            string text = File.ReadAllText(Path.Combine(folderPath, filename));
            IndexableAppContent textContent = AppManagedIndexableAppContent.CreateFromString(contentId, text);
            indexer.AddOrUpdate(textContent);
        }
    }

    public void TextIndexingSample2_RunQuery()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        var folderPath = Windows.ApplicationModel.Package.Current.InstalledLocation.Path;
        // Search the index
        AppIndexTextQuery query = indexer.CreateTextQuery("Facts about kittens.");
        IReadOnlyList<TextQueryMatch> textMatches = query.GetNextMatches(5);
        if (textMatches != null) 
        {
            foreach (var match in textMatches)
            {
                Console.WriteLine(match.ContentId);
                if (match is AppManagedTextQueryMatch textResult)
                {
                    // We load the content of the file that contains the match:
                    string matchingFilename = textFiles[match.ContentId];
                    string fileContent = File.ReadAllText(Path.Combine(folderPath, matchingFilename));
    
                    // Find the substring within the loaded text that contains the match:
                    string matchingString = fileContent.Substring(textResult.TextOffset, textResult.TextLength);
                    Console.WriteLine(matchingString);
                }
            }
        }
    }

文本数据源自文件,但仅对内容编制索引,而不是文件本身。 AppContentIndexer 不了解原始文件,并且不监视更新。 如果文件内容发生更改,应用必须手动更新索引。

为图像数据编制索引,然后搜索相关图像

此示例演示如何使用文本查询为图像数据 SoftwareBitmaps 编制索引,然后搜索相关图像。

    // We load the image data from a set of known files and send that image data to the indexer.
    // The image data does not need to come from files on disk, it can come from anywhere.
    Dictionary<string, string> imageFilesToIndex = new Dictionary<string, string>
        {
            {"item1", "Cat.jpg" },
            {"item2", "Dog.jpg" },
            {"item3", "Fish.jpg" },
            {"item4", "Broccoli.jpg" },
            {"item5", "Computer.jpg" },
            {"item6", "Music.jpg" },
        };
    public void SimpleImageIndexingSample()
    {
        AppContentIndexer indexer = GetIndexerForApp();

        // Add some image data to the index.
        foreach (var item in imageFilesToIndex)
        {
            var file = item.Value;
            var softwareBitmap = Helpers.GetSoftwareBitmapFromFile(file);
            IndexableAppContent imageContent = AppManagedIndexableAppContent.CreateFromBitmap(item.Key, softwareBitmap);
            indexer.AddOrUpdate(imageContent);
        }
    }
    public void SimpleImageIndexingSample_RunQuery()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        // We query the index for some data to match our text query.
        AppIndexImageQuery query = indexer.CreateImageQuery("cute pictures of kittens");
        IReadOnlyList<ImageQueryMatch> imageMatches = query.GetNextMatches(5);
        // One of the images that we indexed was a photo of a cat. We expect this to be the first match to match the query.
        foreach (var match in imageMatches)
        {
            Console.WriteLine(match.ContentId);
            if (match.ContentKind == QueryMatchContentKind.AppManagedImage)
            {
                AppManagedImageQueryMatch imageResult = (AppManagedImageQueryMatch)match;
                var matchingFileName = imageFilesToIndex[match.ContentId];

                // It might be that the match is at a particular region in the image. The result includes
                // the subregion of the image that includes the match.

                Console.WriteLine($"Matching file: '{matchingFileName}' at location {imageResult.Subregion}");
            }
        }
    }

启用 RAG(检索增强生成)场景

RAG(检索增强生成)涉及将用户查询与附加的相关数据结合,以增强语言模型生成响应的能力。 用户的查询充当语义搜索的输入,用于标识索引中的相关信息。 然后,语义搜索生成的数据合并到提供给语言模型的提示中,以便可以生成更准确的上下文感知响应。

此示例演示如何将 AppContentIndexer API 与 LLM 配合使用,将上下文数据添加到应用用户的搜索查询。 此示例是泛型的,没有指定 LLM,该示例仅查询在创建的索引中存储的本地数据(没有对 Internet 的外部调用)。 在此示例中, Helpers.GetUserPrompt()Helpers.GetResponseFromChatAgent() 不是实际函数,仅用于提供示例。

若要使用 AppContentIndexer API 启用 RAG 方案,可以遵循以下示例:

    public void SimpleRAGScenario()
    {
        AppContentIndexer indexer = GetIndexerForApp();
        // These are some text files that had previously been added to the index.
        // The key is the contentId of the item.
        Dictionary<string, string> data = new Dictionary<string, string>
        {
            {"file1", "File1.txt" },
            {"file2", "File2.txt" },
            {"file3", "File3.txt" },
        };
        string userPrompt = Helpers.GetUserPrompt();
        // We execute a query against the index using the user's prompt string as the query text.
        AppIndexTextQuery query = indexer.CreateTextQuery(userPrompt);
        IReadOnlyList<TextQueryMatch> textMatches = query.GetNextMatches(5);
        StringBuilder promptStringBuilder = new StringBuilder();
        promptStringBuilder.AppendLine("Please refer to the following pieces of information when responding to the user's prompt:");
        // For each of the matches found, we include the relevant snippets of the text files in the augmented query that we send to the language model
        foreach (var match in textMatches)
        {
            if (match is AppManagedTextQueryMatch textResult)
            {
                // We load the content of the file that contains the match:
                string matchingFilename = data[match.ContentId];
                string fileContent = File.ReadAllText(matchingFilename);
                // Find the substring within the loaded text that contains the match:
                string matchingString = fileContent.Substring(textResult.TextOffset, textResult.TextLength);
                promptStringBuilder.AppendLine(matchingString);
                promptStringBuilder.AppendLine();
            }
        }
        promptStringBuilder.AppendLine("Please provide a response to the following user prompt:");
        promptStringBuilder.AppendLine(userPrompt);
        var response = Helpers.GetResponseFromChatAgent(promptStringBuilder.ToString());
        Console.WriteLine(response);
    }

在后台线程上使用 AppContentIndexer

AppContentIndexer 实例不与特定线程关联;它是一个敏捷对象,可以跨线程运行。 AppContentIndexer 的某些方法及其相关类型可能需要相当长的处理时间。 因此,建议避免直接从应用程序的 UI 线程调用 AppContentIndexer API,并改用后台线程。

当不再用于释放资源时关闭 AppContentIndexer

AppContentIndexer 实现 IClosable 接口以确定其生存期。 应用程序在不再使用索引器时应关闭索引器。 这允许 AppContentIndexer 释放其基础资源。

    public void IndexerDisposeSample()
    {
        var indexer = AppContentIndexer.GetOrCreateIndex("myindex").Indexer;
        // use indexer
        indexer.Dispose();
        // after this point, it would be an error to try to use indexer since it is now Closed.
    }

在 C# 代码中,接口 IClosable 投影为 IDisposable. C# 代码可以使用 usingAppContentIndexer 实例的模式。

    public void IndexerUsingSample()
    {
        using var indexer = AppContentIndexer.GetOrCreateIndex("myindex").Indexer;
        // use indexer
        //indexer.Dispose() is automatically called
    }

如果在应用中多次打开同一索引,则必须在每个实例上调用 Close

打开和关闭索引是一项昂贵的作,因此应尽量减少应用程序中的此类作。 例如,应用程序可以存储应用程序的 AppContentIndexer 的单个实例,并在应用程序的整个生存期内使用该实例,而不是不断打开和关闭需要执行的每个作的索引。