Improving Japanese Language RAG Accuracy in Azure AI Search

菅原 拓哉 0 Reputation points

I have a question about the accuracy of RAG (Retriever-Augmented Generation), particularly concerning Japanese language searches. After the significant update on 23/11/16, Azure AI Search (formerly Cognitive Search) added a preview version of chunking and vectorization features. I am using these features to create an index, but I am not achieving the desired accuracy. Current situation:

  • The original data is stored in a Blob, and I have checked "Enable hierarchical namespace" to activate semantics.
  • I am creating the index using the "Data Import and Vectorization" as outlined.
  • I have enabled the Semantic Ranker and directly edited the JSON to change the analyzer to "ja.lucene".

The Semantic Search seems to be retrieving the necessary information when checked with the Search Explorer, but the vector search is bringing in inappropriate information, which I think might be affecting the quality of answers in AOAI. If there are any settings that would allow the vector search to retrieve the correct information, I would like to know that.

So the point is, I can't get appropriate output for both RAG and GPT4 original responses.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
860 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,613 questions
{count} votes