The problem arises due to the text extraction limits of the Azure Search indexer, which vary by pricing tier. Specifically, the Free tier allows for 32,000 characters, Basic tier for 64,000 characters, and Standard tiers from 4 million to 16 million characters. If your PDF documents contain more text than the allowed limit, the excess text is truncated and not indexed. This can lead to situations where the indexer processes the document without raising errors but fails to index all the content.
To address this, you should split documents with large amounts of text into smaller documents. This ensures that each document remains within the character limit of your pricing tier, allowing all text to be properly indexed.
For further details, you can refer to the Service limits in Azure AI Search documentation.
Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.