Private data chat on Langchain

Question

Hi Team,

I am trying to create a simple qnachat application on private data using langchain, with Azure OpenAI service, but unable to create and get it. can you please provide the code snippet to do the same? Its difficult to find any resource for it.

Thanks in advance.

Regards,

Janarthanan S

Accepted Answer

Hi @Janarthanan S ,

Glad to know that your issue has been resolved. And thanks for sharing your feedback.

To reiterate the resolution here, let me jot down the gist of my first comment above.

I understand that you are trying to create a Q&A chat application on private data using LangChain with Azure OpenAI service. I will be happy to assist you with this.

Please refer to the below URL of code snippet:

https://github.com/microsoft/azure-openai-in-a-day-workshop/blob/main/qna-chat-with-langchain/qna-chat-with-langchain.ipynb

Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

Answer

Hi Janarthanan S,

Thanks for the question. I would certainly recommend you to follow the below steps to create a simple qnachat application on private data using langchain, with Azure OpenAI service:

Before we get started with the Langchain with Azure Open AI service, make sure that you are having the prerequisite as below:

• An active Azure subscription

• Access granted to Azure OpenAI in the desired Azure subscription (you can ask access at https://aka.ms/oai/access).

• An Azure OpenAI resource with a model deployed.

• Python 3.7 or higher

• LangChain library installed (you can do so via pip install langchain)

Steps to create:

First, create a .env and add your Azure OpenAI Service details:

OPENAI_API_KEY=xxxxxx
OPENAI_API_BASE=https://xxxxxxxx.openai.azure.com/
OPENAI_API_VERSION=2023-05-15

Next, make sure that you have gpt-35-turbo and text-embedding-ada-002 deployed and used the same name as the model itself for the deployment.

User's image

Let’s install the latest versions of openai and langchain via pip:

pip install openai --upgrade
pip install langchain –upgrade

we’re using openai==0.27.8 and langchain==0.0.240

First, let’s initialize our Azure OpenAI Service connection and create the LangChain objects:

import os
import openai
from dotenv import load_dotenv
from langchain.chat_models import AzureChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
 
# Load environment variables (set OPENAI_API_KEY, OPENAI_API_BASE, and OPENAI_API_VERSION in .env)
load_dotenv()
 
# Configure OpenAI API
openai.api_type = "azure"
openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_version = os.getenv('OPENAI_API_VERSION')
 
# Initialize gpt-35-turbo and our embedding model
llm = AzureChatOpenAI(deployment_name="gpt-35-turbo")
embeddings = OpenAIEmbeddings(deployment_id="text-embedding-ada-002", chunk_size=1)

Next, we can load up a bunch of text files, chunk them up and embed them. LangChain supports a lot of different document loaders (https://python.langchain.com/docs/modules/data_connection/document_loaders.html), which makes it easy to adapt to other data sources and file formats. You can have your own sample data.

from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
loader = DirectoryLoader('data/qna/', glob="*.txt", loader_cls=TextLoader, loader_kwargs={'autodetect_encoding': True})
documents = loader.load()
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

Next, let’s ingest documents into Faiss so we can efficiently query our embeddings:

from langchain.vectorstores import FAISS
db = FAISS.from_documents(documents=docs, embedding=embeddings)

Lastly, we can create our document question-answering chat chain. In this case, we specify the question prompt, which converts the user’s question to a standalone question, in case the user asked a follow-up question:

from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
# Adapt if needed
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:""")
qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                           retriever=db.as_retriever(),
                                           condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                                           return_source_documents=True,
                                           verbose=False)
Let’s ask a question:
chat_history = []
query = "what is Azure OpenAI Service?"
result = qa({"question": query, "chat_history": chat_history})
print("Question:", query)
print("Answer:", result["answer"])
From where, we can also ask follow up questions:
chat_history = [(query, result["answer"])]
query = "Which regions does the service support?"
result = qa({"question": query, "chat_history": chat_history})
print("Question:", query)
print("Answer:", result["answer"])

Here we go! We have created a simple qnachat application on private data using langchain, with Azure OpenAI service using the above code.

Please try out these steps with your data and check if it works. Hope this answer helps you with solution! Please comment below if you need any assistance on the same. Happy to help!

Regards,

Chakravarthi Rangarajan Bhargavi

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

Share via

Private data chat on Langchain

1 additional answer

Your answer