Hi @AMROUN Lysa thanks for the question.
Azure OpenAI Service does not currently provide a built-in way to download or access the source file from which the Retrieval-Augmented Generation (RAG) model retrieved the answer. However, you can customize the RAG model to include the source file information in the output, which you can then use to retrieve the file if needed.
Here's how you can approach this:
- When you initialize the RAG model using
openai.Deployment.retrieve_and_read
, you can specify a custom output format using theresponse_format
parameter. This parameter accepts a Python formatting string that can include placeholders for different components of the model's output.
For example, you can include the placeholder {context}
to get the context (source text) that the model used to generate the answer. You can then parse this context to extract the file information.
from openai.retrievers import retrieve, RetrieverRegistry
retriever = RetrieverRegistry.get("azureml.openai")
deployment = openai.Deployment.retrieve_and_read(retriever=retriever, response_format="{result}\n\nContext: {context}")
- After getting the model's output, you can parse the
{context}
part to extract the file information. The exact parsing logic will depend on how you structured your source files and the metadata you included.
For example, if your source files have a consistent naming convention or include file paths in the metadata, you can use string manipulation or regular expressions to extract the file information from the context.
- Once you have the file information (e.g., file path, name, or URL), you can use the appropriate Python libraries or Azure SDK to access or download the file.
If the files are stored in Azure Blob Storage or Azure File Share, you can use the azure-storage-blob
or azure-storage-file
Python libraries to download the files.
If the files are stored in a local file system or network share, you can use the built-in os
and shutil
Python modules to access or copy the files.
It's important to note that this approach requires you to modify the RAG model's output format and implement custom parsing logic to extract the file information. Additionally, you'll need to ensure that your source files or their metadata include the necessary information to identify and locate the files. Best,
Grace