Hi @Anonymous thank you for the question.
Yes, when you use the “Add Your Data” feature in Azure OpenAI Service, it does create an index and an indexer. However, the process of chunking the data is not automatically handled by the indexer.
Chunking is important because the models used to generate embedding vectors have maximum limits on the text fragments provided as input. If your source documents are too large for the maximum input size imposed by models, you will need to insert a chunking step into your workflow
Here’s a simple process to add your data using Azure OpenAI Studio:
- Navigate to Azure OpenAI Studio and sign-in with credentials that have access to your Azure OpenAI resource.
- During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.
- Select the Chat playground tile.
- On the Assistant setup tile, select Add your data (preview) > + Add a data source.
- In the pane that appears, select Upload files under Select data source. Select Upload files.
- Azure OpenAI needs both a storage resource and a search resource to access and index your data
For documents and datasets with long text, it’s recommended to use a data preparation script. If you have large documents, you must insert a chunking step into indexing and query workflows that breaks up large text. Some libraries that provide chunking include LangChain and Semantic Kernel
Even though this process does not automatically chunk your data into smaller pieces, you can use a custom skill that you can attach to a skillset that brings chunking into the indexing pipeline:
Hope that helps.
Best,
Grace