Using RAG (Retrieval-Augmented Generation) in an Azure AI Pipeline over a Private Network

ilyes sais 0 Points de réputation
2025-05-25T18:14:28.0666667+00:00

I want to deploy a RAG flow in Azure AI Pipeline entirely privately, without exposing the OpenAI endpoint to the public Internet. My plan is to:

Create an Azure Cognitive Search index behind a Private Endpoint in a VNet.

Deploy my OpenAI model (GPT-3.5/GPT-4) also behind a Private Endpoint in the same VNet.

  1. Configure the RAG pipeline so it retrieves documents from the private index and calls the model via the private endpoint.

from azure.ai.openai import OpenAIClient

from azure.identity import DefaultAzureCredential

endpoint = "https://<my-private-openai>.openai.azure.com/"

client = OpenAIClient(endpoint=endpoint, credential=DefaultAzureCredential())

response = client.get_models() # results in timeout or 403

print(response)

What am I missing to run a fully private RAG pipeline?

Is there a special setting in the pipeline definition to enforce use of the Private Endpoint?

Are there any known limitations with RAG over a private network in Azure AI Pipeline?

Azure
Azure
Plateforme et infrastructure de cloud computing pour la génération, le déploiement et la gestion d’applications et de services à travers un réseau mondial de centres de données gérés par Microsoft.
611 questions
0 commentaires Aucun commentaire
{count} votes

1 réponse

Trier par : Le plus utile
  1. Ravada Shivaprasad 630 Points de réputation Personnel externe Microsoft Modérateur
    2025-05-27T00:32:09.2966667+00:00

    Hi ilyes sais

    To deploy a fully private Retrieval-Augmented Generation (RAG) pipeline in Azure AI Pipeline, ensuring that both Azure Cognitive Search and Azure OpenAI remain behind private endpoints, you need to verify several key configurations. First, ensure that your Azure OpenAI deployment is correctly configured with a Private Endpoint and that the necessary Network Security Group (NSG) rules allow communication within the Virtual Network (VNet). Additionally, confirm that your Azure Cognitive Search index is accessible via its Private Endpoint and that DNS resolution is correctly set up using Azure Private DNS to resolve the private endpoint addresses.

    Regarding the timeout or 403 error when calling client.get_models(), this typically occurs due to missing role assignments or improper network configurations. Ensure that the Managed Identity associated with your Azure AI Pipeline has the necessary permissions to access Azure OpenAI. You may need to explicitly grant Cognitive Services OpenAI User role to the identity. Also, verify that your private endpoint setup allows outbound access to Azure OpenAI services, as some configurations may block outbound traffic by default.

    To enforce the use of Private Endpoints in your pipeline definition, you should configure the workspace networking settings to restrict outbound traffic to only approved destinations. In Azure Machine Learning, this can be done by enabling Workspace Managed Virtual Network and defining user-defined outbound rules to allow communication with Azure OpenAI and Azure Cognitive Search. Additionally, ensure that your pipeline components reference the private endpoint URLs explicitly.

    There are known limitations when running RAG over a private network in Azure AI Pipeline. One key challenge is ensuring that all dependencies, including vector indexing and model inference, operate within the private network without requiring public internet access. Some Azure services may still require outbound connectivity for metadata retrieval or logging, which must be accounted for in firewall rules. Additionally, latency considerations arise when using private endpoints, as network routing may introduce slight delays compared to public endpoints.

    For a detailed guide on securing RAG workflows with private networking, refer to Microsoft's documentation.
    Thanks


Votre réponse

Les réponses peuvent être marquées comme Réponses acceptées par l’auteur de la question, ce qui permet aux utilisateurs de connaître la réponse qui a résolu le problème de l’auteur.