Azure Cognitive Search multiple blob datasource virtual folders

Alex Gagnon 87 Reputation points
2022-11-25T17:16:28.69+00:00

Is there a way to include multiple blob virtual folders in a pipeline? For example, in Datasources, we can only provide a single virtual folder, and in Indexers we can further filter by file extension. However, what I'd like to do is include two (or more) virtual folders, rather than have to create two datasources.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
694 questions
0 comments No comments
{count} votes

Accepted answer
  1. ajkuma 22,086 Reputation points Microsoft Employee
    2022-11-29T19:22:19.643+00:00

    GagnonAlexandreNC-2861, I'd been discussing on this internally with our product team.

    Currently, this is not supported out of the box. In order to orchestrate something similar -check the following approach:

    If your requirement fits and you're fine to to modify the code to point to specific folder. You may keep an Azure Table of all documents metadata (including location) and use a custom skill from a single indexer.

    Kindly take a look at the sample on how to move blob metadata from Blob Storage to Azure Table (in batch for initial ingestion and event-based to keep it up to date when new blobs are created or deleted), and how to use the Document Extraction skill to crack documents with a custom generated SAS Token.

    Sample from one of our PG team member: https://github.com/ruoccofabrizio/azure-cognitive-search-multiple-containers-indexer

    Your feedback is noted and has been relayed internally.

    If you wish you may share your feedback on Uservoice - All of the feedback, you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure. Additionally, users with a similar request can up-vote your post and add their comments.

    Much appreciate your feedback! Thanks for your patience!

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. ajkuma 22,086 Reputation points Microsoft Employee
    2022-11-28T13:42:11.35+00:00

    GagnonAlexandreNC-2861, Apologies for the delay from over the weekend.

    Based on my understanding of your scenario. I see that you have asked quite similar ask on your other thread.

    Multiple containers from a single built-in indexer is not supported. You would have to either have multiple indexers from different containers pointing to a single index or use Push API to build your own indexer to achieve this functionality.

    Indexers operate at the container level so there's not a way to define an indexer to loop through all blob containers. Generally, you can create an indexer per storage container. Depending on how many containers you have, this can become difficult though (services have limits to the number of indexers you can have).

    Assuming you have a manageable number of containers, its recommended you create your first indexer in the portal and then use a tool like postman or our vs code extension to quickly create the additional data sources and indexers required. You can have multiple indexers all point to the same index.

    Additionally, see:
    Run indexers in parallel

    Tutorial: Index from multiple data sources using the .NET SDK for a step-step instructions
    -As long as both of the data sources have an Id in common, all you really need to do is create an indexer for each data source and point them to the same index.