Cosmos DB; optimizing text searches in large DB

Bubba Jones 211 Reputation points
2022-07-19T11:45:09.837+00:00

I have been looking into trying to optimize text searches in a container that will have potentially millions of items. For this purpose using the Cosmos Emulator, I added 200,000 items to my ForumSearch container in which all items have the same partition key. They have the following format:

{  
    "forumId": "ABC123",  
    "partitionKey": "1",  
    "ForumText": "This is a random Text"  
}  

I then invoked a text search using no more than 5 characters in my search:

using (FeedIterator<ForumSearch> queryResultSetIterator =  
    forumSearchContainer.GetItemLinqQueryable<ForumSearch>().Where(x => x.ForumText.ToString().Contains("This ")).ToFeedIterator<ForumSearch>())  
{  
    while (queryResultSetIterator.HasMoreResults)  
    {  
        // Searching using 5 string characters costs 126 RU's with 200,000 items  
        FeedResponse<ForumSearch> currentResultSet = await queryResultSetIterator.ReadNextAsync();     
    }  
}  

With 200,000 items in the container, the above query costs 126 RU's. I then resorted to reading all the Items from the DB into a C# list item. Assume that the forumSearches variable below is populated with all 200,000 ForumSearch items:

List<ForumSearch> forumSearches = CoreDataService.Instance.GetForumSearches();  
  
ForumSearch forumSearch = forumSearches.Where(x => x.ForumText.ToString().Contains("This")).FirstOrDefault();  

The above takes a split second to execute and i'm guessing that overall it consumes a lot less resources as we are now talking about searching through an in-memory list, a form of caching if you will. Is the above the standard approach to dealing with text searches for containers with many items or is there a way to optimize the DB search even further?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,543 questions
0 comments No comments
{count} votes

Accepted answer
  1. Hasan Savran 331 Reputation points MVP
    2022-07-19T14:54:28.523+00:00

    It sounds like you have/will have a lot of data. This means you will have many physical partitions.
    Also, It sounds like you are not able to use partition key in your CONTAINS query.
    You will not see significant RU reduce since your query needs to run in every physical partition.
    I agree with you, You should find a different way to query the data if you don't have the partition key available.
    Check out the Azure Cognitive Search (Azure Search). It works well with Cosmos DB and it lets you run full text search queries.

    Following links should be helpful.
    https://learn.microsoft.com/en-us/azure/search/search-howto-index-cosmosdb
    https://learn.microsoft.com/en-us/learn/modules/search-azure-cosmos-db-sql-api-data-azure-cognitive-search/


1 additional answer

Sort by: Most helpful
  1. Hasan Savran 331 Reputation points MVP
    2022-07-19T13:04:12.94+00:00

    First, I recommend you to write regular queries rather than linq so you will have full control for optimization.
    I guess Linq query gives you a query similar to this:
    SELECT * FROM c WHERE CONTAINS(c.ForumText, 'This')

    To optimize this, try to add ORDER BY ForumText at the end.

    SELECT * FORM c WHERE CONTAINS(c.ForumText,'This') ORDER BY c.ForumText

    Please let us know the RU after ORDER BY

    Thank you