Azure AI Search indexer shows correct total file count but the index has very less reference to the content of documents

Mansi Yadav 60 Reputation points
2024-09-05T18:05:02.0133333+00:00

We have currently switched to ACS basic tier and each index has max storage 15gb. In my adls container I have an excel in a folder named content. Im using split skillset those chunk the file. After running g the indexer it ends quickly within 5-8secs and shows the correct file count that has been indexed and the documents or chunks in index should be around 4000 but it shows only 34. Total size of this index is 200kb and max size is 15gb. Earlier we had standard tier and this problem was not faced and I don't feel its due to the tier as available space is 15gb and the file size that I'm trying to index is of 980 kb.

Please helpe to resolve this issue quickly.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,297 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. SnehaAgrawal-MSFT 22,691 Reputation points Moderator
    2024-09-11T14:46:47.08+00:00

    @Mansi Yadav Thanks for asking question! An indexer might show a different document count than either the data source, the index itself, or count in your code. Here are some possible reasons why this behavior can occur:

    • The index can lag in showing the real document count, especially in the portal.
    • The indexer has a Deleted Document Policy. The deleted documents get counted by the indexer if the documents are indexed before they get deleted.
    • If the ID column in the data source isn't unique. This applies to data sources that have the concept of columns, such as Azure Cosmos DB.
    • If the data source definition has a different query than the one you're using to estimate the number of records. In example, in your database, you're querying the database record count, while in the data source definition query, you might be selecting just a subset of records to index.
    • The counts are being checked at different intervals for each component of the pipeline: data source, indexer and index.
    • The data source has a file that's mapped to many documents. This condition can occur when indexing blobs and "parsingMode" is set to jsonArray and jsonLines.

    Hope this helps- Let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.