About data sources created with Azure OpenAI Service "Add Your Data"

T. TABATA 70 Reputation points
2023-10-06T02:41:41.88+00:00

"Add Your Data" automatically creates an index that is divided into chunks by setting it on the data existing in the blob storage. It also creates an indexer at the same time.

At this time, when you import a file that needs to be chunked into blob storage,

is the indexer automatically chunked and automatically reflected in the index?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 19,151 Reputation points Moderator
    2023-10-10T05:11:36.41+00:00

    Hi @Anonymous thank you for the question.

    Yes, when you use the “Add Your Data” feature in Azure OpenAI Service, it does create an index and an indexer. However, the process of chunking the data is not automatically handled by the indexer.

    Chunking is important because the models used to generate embedding vectors have maximum limits on the text fragments provided as input. If your source documents are too large for the maximum input size imposed by models, you will need to insert a chunking step into your workflow

    Here’s a simple process to add your data using Azure OpenAI Studio:

    1. Navigate to Azure OpenAI Studio and sign-in with credentials that have access to your Azure OpenAI resource.
    2. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.
    3. Select the Chat playground tile.
    4. On the Assistant setup tile, select Add your data (preview) > + Add a data source.
    5. In the pane that appears, select Upload files under Select data source. Select Upload files.
    6. Azure OpenAI needs both a storage resource and a search resource to access and index your data

    For documents and datasets with long text, it’s recommended to use a data preparation script. If you have large documents, you must insert a chunking step into indexing and query workflows that breaks up large text. Some libraries that provide chunking include LangChain and Semantic Kernel

    Even though this process does not automatically chunk your data into smaller pieces, you can use a custom skill that you can attach to a skillset that brings chunking into the indexing pipeline:

    https://github.com/Azure-Samples/azure-search-power-skills/blob/main/Vector/EmbeddingGenerator/README.md

    https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-ingestion-python-sample.ipynb

    Hope that helps.

    Best,

    Grace


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.