What is the best storage type for ongoing indexing?

Peter Nicolai Skovgaard 0 Reputation points
2023-03-13T16:19:08.03+00:00

 

I need to store a bunch of documents (anything from a few hundred to around maybe 10,000) in Azure. I then have a search indexer that cracks these and updates an index. By documents I just mean a bunch of (complex) json objects.

 

Every hour I will have a service that runs and updates any potentially changed objects and the indexer will run again. Azure of course has a few options for storage to support this such as blob storage or cosmos. After doing a lot of research I have not really been able to find what is the best storage type to use for this scenario. My criteria are as follows:

  • The storage type will need to support complex json (ie. with nested properties).
  • Each document will have a (relatively) "large" description field which is what is used for my index search in the end, the rest of the json is just metadata.

 

From what I gather the "easiest" solution would be to use something like Cosmos with MongoDb as it natively supports complex json. I am a bit worried though that I will have issues with my description field being too large. That leads me to blob storage, but I'm not sure how well this will work with this whole hourly syncing. I reckon Blob storage is more intended for readonly uploads and not for "dynamic" data such as this.

 

Hopefully someone in here has some idea about what I could do or where I could go to read more about such a relatively specific scenario!

 

Finally I could add that the reason I don't just add the data directly to a search index is that I need to do some enrichments in an indexer.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
994 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,036 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,632 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Bruce (SqlWork.com) 65,131 Reputation points
    2023-03-13T20:14:17.23+00:00

    the number of documents is small, but you don't specify typical size. the data storage requirements are simple, it the indexing requirements that need to be defined.

    also what will you use to the support the index once built? how big is the index field? how will you search it? are there joins? just simple lookup?

    without more requirements it hard to suggest. you could use azure sql server, azure cosmos, azure blobs with index tags, etc


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.