Azure OpenAI Chat RAG with Own Tabular Data not working at all

rstudent07 0 Reputation points
2024-05-21T14:04:06.6133333+00:00

I'm building an Azure AI Search + OpenAI chatbot to "talk/speak" with my own data. I'm in the sports field, and we need to create a chatbot capable of retrieving actual data from Excel files that we have and performing basic summary, sorting, and filtering tasks. Our data is mainly historical data on athletes. This is a quick look into what we have:

Player Scoring Games-In Red-Flags Net-Worth League
John Doe 87 150 2 12,500,000 Alpha League
Jane Smith 92 200 1 25,000,000 Beta League
Max Powers 75 180 3 10,000,000 Gamma League
Lucy Heart 68 210 4 15,000,000 Delta League
Jack Hunter 80 170 2 8,000,000 Epsilon League
Emma Stone 95 220 1 30,000,000 Zeta League
Liam Knight 78 160 2 11,500,000 Eta League
Ava Strong 85 190 3 14,000,000 Theta League
Noah Swift 90 230 1 20,000,000 Iota League
Mia Quick 83 175 2 9,000,000 Kappa League

I have successfully created an index and semantic ranker with AI search, dividing our data into chunks to make it easier and more manageable for the search algorithm. We have also decided to include a file with the definitions of each feature. For instance, 'Net-Worth' is the value in US dollars of the athlete for business and financial purposes.

But when I ask my chatbot basic queries or questions like:

"What is the scoring of Jane Smith?" instead of retrieving 92, it says, "Sorry, I don't have sufficient data to answer that question."

When I ask it, "Who are the top 3 athletes with the highest net worth?" instead of giving a list sorted by player and their net worth, it just retrieves, "Sorry, I don't have sufficient data to answer that question."

Mind you, my data is extensive, with almost 15,000 entries which I have divided into very small chunks of information, keeping the headers to not lose context.

For those who have worked with tabular data with lots of quantitative and qualitative information in chatbots, what do you suggest is the best approach to solve these kinds of issues?

I'm currently dividing the Excel files into smaller Excel files, which are in XML format. Is it better to divide them into chunks of JSON, CSV, or .txt files? I would greatly appreciate any help from those of you who have worked with this type of data. Thank you

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
810 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,479 questions
{count} votes