How to create a RAG pipeline Using Llama 2 deployed in Azure ML serverless endpoint (connected via Json payload)) and Azure ai search

Question

How to create a RAG pipeline Using Llama 2 deployed in Azure ML serverless endpoint (connected via Json payload)) and Azure ai search

Vishnu k 0

I'm using Llama2 model deployed in Azure ML workspace in a serverless endpoint connected via Json payload. I have faced some challenges during the project because there was a very limited documentation that we got about this topic.

**1.Can we create a LLM chain in Lang chain using a LLM (Llama2) that can connect via Json payload?

2.How can we create a complete RAG cycle using Llama2 and Azure AI search?**

I created separate pipeline's for retrieving the document and then feed the relevant chunk into the prompt to get the answer for my question. I got less accuracy, model was including some other details which is also available in the chunk, and i tried with different prompt templates and temperature values but the response quality remains the same. Is there any python framework that can help us to build a complete RAG pipeline, like how we are giving the extra body parameter in OpenAI framework. Lang chain also not supporting to create a LLM RAG chain when we use Json payload.

2 answers

Your answer

Answer 1

Thanks for the question, Creating an LLM Chain in Lang Chain using Llama2 with JSON Payload:

Llama2 is now available in the model catalog within Azure Machine Learning. This model catalog, currently in public preview, serves as a hub for foundation models, allowing users to easily discover, customize, and operationalize large foundation models at scale.
To deploy Llama2 models with pay-as-you-go, you can use Azure Machine Learning studio. Certain models from the Llama2 family are available in Azure Marketplace for deployment as a service with pay-as-you-go billing. These models include: Meta Llama-2-7B (preview) Meta Llama 2 7B-Chat (preview) Meta Llama-2-13B (preview) Meta Llama 2 13B-Chat (preview) Meta Llama-2-70B (preview) Meta Llama 2 70B-Chat (preview).
If you need to deploy a different model, consider deploying it to real-time endpoints instead. Ensure that you have an Azure subscription with a valid payment method and an Azure Machine Learning workspace with a compute instance. Note that the pay-as-you-go model deployment offering is available only in workspaces created in East US 2 and West US 3 regions.
Unfortunately, Lang Chain doesn’t directly support creating an LLM RAG chain with JSON payload.Azure AI Search can be leveraged for efficient document retrieval and ranking. Index your documents, create search queries, and rank results based on relevance. Combine this with Llama2 for answer generation.
Remember that RAG pipelines require iterative tuning and experimentation to achieve optimal results.

Answer 2

Thanks for posting your question in the Microsoft Q&A forum.

Creating a RAG (Retrieve, Answer, Generate) pipeline using Llama 2, an Azure ML serverless endpoint connected via JSON payload, and Azure AI Search involves several steps.

The success of your RAG pipeline depends on the quality of your Llama 2 model, the relevance of your Azure AI Search index, and the integration between the two. Continuously monitor and improve the pipeline based on user feedback and performance metrics.

Does the response help answer your question? Please remember to "Accept Answer" if any answer/reply helped so that others in the community facing similar issues can easily find the solution. Thanks 😊

Share via

How to create a RAG pipeline Using Llama 2 deployed in Azure ML serverless endpoint (connected via Json payload)) and Azure ai search

2 answers

Your answer