I have been looking into trying to optimize text searches in a container that will have potentially millions of items. For this purpose using the Cosmos Emulator, I added 200,000 items to my ForumSearch container in which all items have the same partition key. They have the following format:
{
"forumId": "ABC123",
"partitionKey": "1",
"ForumText": "This is a random Text"
}
I then invoked a text search using no more than 5 characters in my search:
using (FeedIterator<ForumSearch> queryResultSetIterator =
forumSearchContainer.GetItemLinqQueryable<ForumSearch>().Where(x => x.ForumText.ToString().Contains("This ")).ToFeedIterator<ForumSearch>())
{
while (queryResultSetIterator.HasMoreResults)
{
// Searching using 5 string characters costs 126 RU's with 200,000 items
FeedResponse<ForumSearch> currentResultSet = await queryResultSetIterator.ReadNextAsync();
}
}
With 200,000 items in the container, the above query costs 126 RU's. I then resorted to reading all the Items from the DB into a C# list item. Assume that the forumSearches variable below is populated with all 200,000 ForumSearch items:
List<ForumSearch> forumSearches = CoreDataService.Instance.GetForumSearches();
ForumSearch forumSearch = forumSearches.Where(x => x.ForumText.ToString().Contains("This")).FirstOrDefault();
The above takes a split second to execute and i'm guessing that overall it consumes a lot less resources as we are now talking about searching through an in-memory list, a form of caching if you will. Is the above the standard approach to dealing with text searches for containers with many items or is there a way to optimize the DB search even further?