AzureOpen AI With Multiple CSV

Question

AzureOpen AI With Multiple CSV

Ali Rizvi 0

Hi Everyone,

I’m currently working on a setup using Azure Blob Storage where I upload multiple CSV files. These files are being indexed through Azure Cognitive Search (Standard tier, not Basic).

After indexing, I connect the search index as a data source to an Azure AI / GPT model, but the results are not accurate and don’t return the expected information.

I would really appreciate any guidance on the best practices for:

Structuring CSV files for better indexing

Configuring Cognitive Search indexers and skillsets

Improving search quality or vector retrieval

Ensuring accurate responses when the index is connected to Azure OpenAI / AI Studio

If anyone has experience or recommendations for improving accuracy in this setup, please let me know.

Thank you!

0 comments

1 answer

Your answer

Answer 1

Hello Ali Rizvi,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to work efficiently using AzureOpen AI With Multiple CSV.

With the steps below you will be able achieve your aim:

Put into consideration your architecture: Pipeline: Blob Storage > Cognitive Search (Hybrid Index) > Azure OpenAI (RTR Pattern)
Structuring CSV Files is very important:
- Ensure consistent schema across all CSVs.
- Use descriptive headers and normalize data types.
- Add a unique id column for each row.
Use this link - https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage for more details.
Your Cognitive Search Index Design should be: Create fields:
```
   `content` (searchable, retrievable)

   `embedding` (vector field for embeddings)
```
Enable Semantic Search and Hybrid Search. Check this on https://learn.microsoft.com/en-us/azure/search/semantic-search-overview
Embedding Generation:
- Use Azure OpenAI text-embedding-ada-002 or text-embedding-3-large.
- Chunk text into 500–1000 tokens before embedding.
- Store embeddings in Cognitive Search vector fields. For more details - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings
On RTR Pattern Implementation navigate:
- Query Cognitive Search > Retrieve top N results > Pass as context to GPT. Use system prompt: “Answer based only on the provided context. If unsure, say ‘Not found.’” Keep temperature=0 for factual accuracy. use this link https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-overview for more details.
For the skillsets:
- Add Text Analytics for entity/key phrase extraction.
- Use Custom Skills for domain-specific enrichment.
https://learn.microsoft.com/en-us/azure/search/cognitive-search-skillset>

Finally, you need to put into consideration:

Cost by optimize embedding generation by batching.
Latency by using cache frequent queries.
Context overflow by limit to top 3–5 chunks.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

AzureOpen AI With Multiple CSV

1 answer

Your answer