@Bao, Jeremy (Cognizant) Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
Why is chunking important?
The models used to generate embedding vectors have maximum limits on the text fragments provided as input. For example, the maximum length of input text for the Azure OpenAI embedding models is 8,191 tokens. Given that each token is around 4 characters of text for common OpenAI models, this maximum limit is equivalent to around 6000 words of text. If you're using these models to generate embeddings, it's critical that the input text stays under the limit. Partitioning your content into chunks ensures that your data can be processed by the Large Language Models (LLM) used for indexing and queries.
Content overlap considerations
When you chunk data, overlapping a small amount of text between chunks can help preserve context. We recommend starting with an overlap of approximately 10%. For example, given a fixed chunk size of 256 tokens, you would begin testing with an overlap of 25 tokens. The actual amount of overlap varies depending on the type of data and the specific use case, but we have found that 10-15% works for many scenarios.
Sentence chunking with "10% overlap"
In this you create an overlap between chunks according to certain ratio. A 10% overlap on maximum tokens of 10 is one token. See the details here.
If you want to try the Chunking and vector embedding generation sample. Refer this.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.