When you index data in Azure Cognitive Search, the service creates an inverted index that maps terms to the documents that contain them. The index is used to quickly find documents that match a search query. The size of the index depends on the amount of text that is indexed, and the number of unique terms in that text.
When you retrieve a field that contains a large amount of data, the size of the index can increase significantly. This is because the inverted index needs to store information about every term in the field, and every document that contains those terms. If the field is very large, this can result in a large number of terms, and a large number of documents that contain those terms.
In your example scenario, if one of the retrieval fields has 95% of the data, then the inverted index for that field will be much larger than the inverted index for the other fields. This can result in a significant increase in storage overhead.
To reduce the storage overhead, you can consider reducing the amount of data that is stored in the large field. For example, you could store the data in a separate storage account, and only store a reference to the data in the search index. Alternatively, you could split the large field into smaller fields, and only retrieve the fields that are needed for a particular query.
I hope this helps! Let me know if you have any other questions.