Hello Harinath,
This is a very interesting topic and common issue on post POC's.
It can be a combination of several things, Use of LLM vs SLM, complexity of the reference and source of data used to provide RAG Answers, and many others.
Knowing the source of the data, where it resides and the complexity of the assessment of the data can potentially explain why a response takes 16 seconds.
Also, pre-processing the data (e.g. Using Azure AI Search) to index and vectorize larger data sets, increases the overall cost, but increases both response time and accuracy.
Are you using a static source of data located in a data storage, are you use web scrap or a list of external data sources? If yes, what is the size of this data, its format and more insights? What is the based data that you need to be analyzed by LLM?
If you could provide a generic no-sensitive diagram showing the high-level architecture design could point to areas where you have a time-consuming process.