Is there any limitation on the number of files when indexing Azure DataLake Storage from Azure Cognitive Search indexer?

test29998411 281 Reputation points
2022-03-19T03:07:33.823+00:00

My team is trying to index over millions of PDF files on Azure Datalake Storage containers with FormRecoginzer and Azure Cognitive Search indexers.

https://learn.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-form

Are there any limitations of Azure Cognitive Search when indexing a large number of files?
I am concerned about this.

https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

Is there any limit to the number of files that can be indexed from Azure Cognitive Search indexer to files on Azure DataLake Storage?

If there is a limit, we are considering splitting the number of files to be indexed by folder.

Also, if indexing fails for only some files, do we need to re-run the indexer and re-index all files?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,061 questions
0 comments No comments
{count} votes

Accepted answer
  1. Grmacjon-MSFT 18,451 Reputation points
    2022-03-23T23:31:14.223+00:00

    Hi @test29998411 ,
    Thanks for your question.

    Based on the doc you shared it states:

    "As of October 2018, there are no longer any document count limits for any new service created at any billable tier (Basic, S1, S2, S3, S3 HD) in any region. Older services created prior to October 2018 may still be subject to document count limits.

    To determine whether your service has document limits, use the GET Service Statistics REST API. Document limits are reflected in the response, with null indicating no limits."

    What tier are you currently using?

    Best,
    Grace

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.