Edit

Share via


Baseline Azure AI Foundry chat reference architecture in an Azure landing zone

Azure OpenAI Service
Azure AI services
Azure App Service
Azure Key Vault
Azure Monitor

This article is part of a series that builds on the Baseline AI Foundry chat reference architecture. Review the baseline architecture so that you can identify necessary adjustments before you deploy it in an Azure application landing zone subscription.

This article describes a generative AI workload architecture that deploys the baseline chat application but uses resources that are outside the workload team's scope. Platform teams centrally manage the resources, and multiple workload teams use them. Shared resources include networking resources for cross-premises connections, identity access management systems, and policies. This guidance helps organizations that use Azure landing zones maintain consistent governance and cost efficiency.

Azure AI Foundry uses accounts and projects to organize AI development and deployment. For example, a landing zone implementation might use an account as a centralized resource at a business group level and projects as a delegated resource for each workload in that business group. Because of resource organization factors, and cost allocation limitations, we don't recommend this topology, and this article doesn't provide guidance about it. Instead, this architecture treats the workload as the owner of the Azure AI Foundry instance, which is the recommended approach.

As a workload owner, you delegate shared resource management to platform teams so that you can focus on workload development efforts. This article presents the workload team's perspective and specifies recommendations for the platform team.

Important

What are Azure landing zones?

Azure landing zones divide your organization's cloud footprint into two key areas:

  • An application landing zone is an Azure subscription where a workload runs. An application landing zone connects to your organization's shared platform resources. That connection provides the landing zone with access to the infrastructure that supports the workload, such as networking, identity access management, policies, and monitoring.

  • A platform landing zone is a collection of various subscriptions that multiple platform teams can manage. Each subscription has a specific function. For example, a connectivity subscription provides centralized Domain Name System (DNS) resolution, cross-premises connectivity, and network virtual appliances (NVAs) for platform teams.

To help you implement this architecture, understand Azure landing zones, their design principles, and their design areas.

Article layout

Architecture Design decisions Azure Well-Architected Framework approach
Architecture diagram
Workload resources
Federated resources
Subscription setup
Networking
Data scientist access
Monitor resources
Organizational governance
Change management
Reliability
Security
Cost Optimization
Operational Excellence
Performance Efficiency

Tip

The Azure AI Foundry Agent Service chat baseline reference implementation demonstrates the best practices described in this article. Review and try these deployment resources before you choose and implement your design decisions.

Architecture

Architecture diagram of the workload, including select platform subscription resources.

Download a Visio file of this architecture.

Components

All Azure landing zone architectures separate ownership between the platform team and the workload team, which is referred to as subscription democratization. Application architects, data scientists, and DevOps teams must clearly understand this division to determine what falls under their direct influence or control and what doesn't.

Like most application landing zone implementations, the workload team primarily manages the configuration, deployment, and oversight of workload components, including the AI services in this architecture.

Workload team-owned resources

The following resources remain mostly unchanged from the baseline architecture.

  • Azure AI Foundry accounts and projects enable the workload team to host generative AI models as a service, implement content safety, and establish workload-specific connections to knowledge sources and tools.

    If your organization's AI Center of Excellence restricts access to AI model deployments, the workload team might not host models in projects and accounts. Instead, they might need to use centralized AI resources. In this scenario, all model consumption usually flows through a gateway that your AI platform team provides.

    This article assumes that generative AI models in this scenario are workload-owned resources. If they're not, the model host, or a gateway to the models, becomes a workload dependency. The platform team must maintain reliable network connectivity to the APIs.

    Foundry Agent Service treats model dependencies in a specific way, so challenges can occur when you consume centrally hosted models. You might need to use an alternative orchestrator.

  • Foundry Agent Service provides the orchestration layer for chat interactions. It hosts and manages the chat agent that processes user requests.

    Use the standard agent setup in this architecture. Connect your agent to a dedicated subnet in your spoke virtual network, and route egress traffic through your connectivity subscription.

    The workload team supplies dedicated Azure resources for agent state, chat history, and file storage. These resources are Azure Cosmos DB for NoSQL, Azure Storage, and Azure AI Search. Your Foundry Agent Service instance manages these resources and their data exclusively. Other application components in your workload or other workloads in your organization shouldn't use them.

  • Azure App Service hosts the web application that contains the chat user interface (UI). App Service has three instances in different Azure zones.

    An Azure Storage account hosts the web application's code as a ZIP file, which mounts within App Service.

  • AI Search retrieves relevant indexed data for application user queries. AI Search serves as the workload knowledge store for the Retrieval Augmented Generation pattern. This pattern extracts an appropriate query from a prompt, queries AI Search, and uses the results as grounding data for a generative AI foundation model.

  • Azure Application Gateway serves as the reverse proxy to route user requests to the chat UI hosted in App Service. The selected SKU also hosts an Azure web application firewall to protect the front-end application from potentially malicious traffic.

    Azure Key Vault stores the application gateway's Transport Layer Security (TLS) certificate.

  • Azure Monitor, Azure Monitor Logs, and Application Insights collect, store, and visualize observability data.

  • Azure Policy applies workload-specific policies to help govern, secure, and apply controls at scale.

The workload team also maintains the following resources:

  • Spoke virtual network subnets and the network security groups (NSGs) on those subnets maintain segmentation and control traffic flow.

  • Private endpoints secure connectivity to platform as a service (PaaS) solutions.

Platform team-owned resources

The platform team owns and maintains the following centralized resources. This architecture assumes that these resources are pre-provisioned and treats them as dependencies.

  • Azure Firewall in the hub network routes, inspects, and restricts egress traffic that originates from the workload, including agent traffic. Workload egress traffic goes to the internet, cross-premises destinations, or to other application landing zones.

    Change from the baseline: In the baseline architecture, the workload team owns this component. In this architecture, the platform team manages it under the connectivity subscription.

  • Azure Bastion in the hub network provides secure operational access to workload components and allows access to Azure AI Foundry components.

    Change from the baseline: In the baseline architecture, the workload team owns this component.

  • The spoke virtual network is where the workload is deployed.

    Change from the baseline: In the baseline architecture, the workload team owns this network.

  • User-defined routes (UDRs) enforce tunneling to the hub network.

    Change from the baseline: In the baseline architecture, the workload team owns this network.

  • Azure Policy-based governance constraints and DeployIfNotExists (DINE) policies apply to the workload subscription. You can apply these policies at the platform team-owned management group level or to the workload's subscription directly.

    Change from the baseline: These policies are new in this architecture. The platform team applies policies that constrain your workload. Some policies might duplicate existing workload constraints or introduce new constraints.

  • Azure private DNS zones host A records for private endpoints. For more information, see Azure Private Link and DNS integration at scale.

    Change from the baseline: In the baseline architecture, the workload team owns this network. In this architecture, the platform team manages this component under the connectivity subscription.

  • DNS resolution service supports spoke virtual networks and cross-premises workstations. This service typically uses Azure Firewall as a DNS proxy or Azure DNS Private Resolver. In this architecture, the service resolves private endpoint DNS records for all DNS requests from the spoke. DNS Private Resolver and linked rulesets is the recommended way for the platform team to enable this architecture resolution requirements due to the DNS resolution characteristics of Foundry Agent Service.

  • Azure DDoS Protection helps protect public IP addresses from distributed attacks.

    Change from the baseline: In the baseline architecture, the workload team purchases DDoS Protection.

Important

Azure landing zones provide some of the preceding resources as part of the platform landing zone subscriptions. Your workload subscription provides other resources. Many of these resources reside in the connectivity subscription, which also includes Azure ExpressRoute, Azure VPN Gateway, and DNS Private Resolver. These resources provide cross-premises access and name resolution. The management of these resources falls outside the scope of this article.

Subscription setup

The workload team must inform the platform team of specific landing zone requirements to implement this architecture. And the platform team must communicate its requirements to the workload team.

For example, the workload team must provide detailed information about the required networking space. The platform team uses this information to allocate the necessary resources. The workload team defines the requirements, and the platform team assigns the appropriate IP addresses within the virtual network.

The platform team assigns a management group based on the workload's business criticality and technical needs. For instance, if the workload is exposed to the internet, like this architecture, the platform team selects an appropriate management group. To establish governance, the platform team also configures and implements management groups. The workload team must design and operate the workload within the constraints of this governance. For more information about typical management group distinctions, see Tailor the Azure landing zone architecture.

The platform team sets up the subscription for this architecture. The following sections provide guidance about the initial subscription setup.

Workload requirements and fulfillment

The workload team and platform team must collaborate on details like management group assignment, Azure Policy governance, and networking setup. Prepare a checklist of requirements to initiate discussion and negotiation with the platform team. The following checklist serves as an example.

  Design consideration Workload requirement for this architecture
The number of spoke virtual networks and their size: The platform team creates and configures the virtual network, then peers it to the regional hub to designate it as a spoke. They also need to ensure that the network can accommodate future workload growth. To carry out these tasks effectively, they must know the number of spokes required. Deploy all resources in a single, dedicated spoke virtual network. Request /22 contiguous address space to support full-scale operations and scenarios like side-by-side deployments.

The following factors determine most IP address needs:

- Application Gateway requirements for the subnet size (fixed size).

- Private endpoints with single IP addresses for PaaS services (fixed size).

- The subnet size for build agents (fixed size).

- Foundry Agent Service requires a subnet within a /24 prefix.
Virtual network address prefixes: Typically, the platform team assigns IP addresses based on existing conventions, avoidance of overlap with peered networks, and availability within the IP address management (IPAM) system. The agent integration subnet must use an address prefix that starts with 172. or 192. such as 192.168.45.1/24. A runtime restriction in the Foundry Agent Service capability host enforces this requirement. Foundry Agent Service doesn't support subnets that use 10.. Ask your platform team to provide a spoke that has a valid address prefix for your agent subnet.
Deployment region: The platform team needs to deploy a hub in the same region as the workload resources. Communicate the selected region for the workload and the regions for underlying compute resources. Ensure that the regions support availability zones. Azure OpenAI in Foundry Models has limited regional availability.
Type, volume, and pattern of traffic: The platform team needs to determine the ingress and egress requirements of your workload's shared resources. Provide information about the following factors:

- How users should consume this workload.

- How this workload consumes its surrounding resources.

- The configured transport protocol.

- The traffic pattern and the expected peak and off-peak hours. Communicate when you expect a high number of concurrent connections to the internet (chatty) and when you expect the workload to generate minimal network traffic (background noise).
Firewall configuration: The platform team needs to set rules to allow legitimate egress traffic. Share details about outbound traffic from the spoke network, including agent traffic.

Build agent and jump box machines need regular OS patching.

Agents might need to interact with internet grounding sources, tools, or other agents hosted outside the workload.
Ingress traffic from specialized roles: The platform team needs to provide the specified roles with network access to the workload and implement proper segmentation. Work with the platform team to determine the best way to allow authorized access for the following roles:

- Data scientists and developers that access the Azure AI Foundry portal from their workstations on corporate network connections

- Operators that access the compute layer through a workload-managed jump box
Public internet access to the workload: The platform team uses this information for risk assessment, which drives several decisions:

- The placement of the workload in a management group with appropriate guardrails

- Distributed denial-of-service (DDoS) protection for the public IP address reported by the workload team

- TLS certificate procurement and management
Inform the platform team about the ingress traffic profile:

- Internet-sourced traffic that targets the public IP address on Application Gateway

- Fully qualified domain names (FQDNs) associated with the public IP address for TLS certificate procurement
Private endpoint usage: The platform team needs to set up Azure private DNS zones for the private endpoints and ensure that the firewall in the hub network performs DNS resolution correctly. Inform the platform team about all resources that use private endpoints, including the following resources:
- AI search
- Azure Cosmos DB for NoSQL
- Key Vault
- Azure AI Foundry
- Storage accounts

Understand how the hub handles DNS resolution, and define the workload team's responsibilities for the management of private DNS zone records and DNS Private Resolver ruleset linking.
Centralized AI resources: The platform team must understand the expected usage of models and hosting platforms. They use this information to establish networking to centralized AI resources within your organization.

Each organization defines its own AI adoption and governance plans, and the workload team must operate within those constraints.
Inform the platform team about AI and machine learning resources that you plan to use. This architecture uses Azure AI Foundry, Foundry Agent Service, and generative foundation models hosted in Azure AI Foundry.

Clearly understand which centralized AI services you must use and how those dependencies affect your workload.

Important

The platform team should follow a subscription vending process that uses a structured set of questions to collect information from the workload team. These questions might vary across organizations, but the goal is to gather the necessary input to implement subscriptions effectively. For more information, see Subscription vending.

Compute

The orchestration layer and chat UI hosting remain the same as the baseline architecture.

Networking

In the baseline architecture, the workload is provisioned in a single virtual network.

Change from the baseline: This architecture divides the workload over two virtual networks. One network hosts workload components. The other network manages internet and hybrid connectivity. The platform team determines how the workload's virtual network integrates with the organization's larger network architecture, which typically follows a hub-spoke topology.

Architecture diagram that focuses mostly on network ingress flows.

Download a Visio file of this architecture.

  • Hub virtual network: This virtual network serves as a regional hub that contains centralized, and often shared, services that communicate with workload resources in the same region. The hub resides in the connectivity subscription. The platform team owns the resources in this network.

  • Spoke virtual network: In this architecture, the single virtual network from the baseline architecture essentially becomes the spoke virtual network. The platform team peers this spoke network to the hub network. They own and manage the spoke network, including its peering and DNS configuration. The workload team owns the resources in this network, including its subnets. This network contains many of the workload resources.

Because of this division of management and ownership, the workload team must clearly communicate the workload's requirements to the platform team.

Important

For the platform team: Don't directly peer the spoke network to another spoke network, unless the workload specifically requires it. This practice protects the segmentation goals of the workload. Your team must facilitate all transitive virtual network connections. However, if application landing zone teams directly connect their networks by using self-managed private endpoints, your team doesn't manage those connections.

Understand which workload resources external teams manage. For example, understand the network connectivity between the chat agents and a grounding context vector database that another team manages.

Virtual network subnets

In the spoke virtual network, you create and allocate the subnets based on the workload requirements. To provide segmentation, apply controls that restrict traffic into and out of the subnets. This architecture doesn't add subnets beyond the subnets in the baseline architecture. However, the network architecture no longer requires the AzureBastionSubnet or AzureFirewallSubnet subnets because the platform team likely hosts this capability in their subscriptions.

You still have to implement local network controls when you deploy your workload in an Azure landing zone. Your organization might impose further network restrictions to safeguard against data exfiltration and ensure visibility for the central security operations center and the IT network team.

Ingress traffic

The ingress traffic flow remains the same as the baseline architecture.

You manage resources related to public internet ingress into the workload. For example, in this architecture, Application Gateway and its public IP address reside in the spoke network rather than the hub network. Some organizations place ingress-facing resources in a connectivity subscription by using a centralized perimeter network (also known as DMZ, demilitarized zone, and screened subnet) implementation. Integration with that specific topology falls outside the scope of this article.

Alternate approach to inspecting incoming traffic

This architecture doesn't use Azure Firewall to inspect incoming traffic, but sometimes organizational governance requires it. In those cases, the platform team supports the implementation to provide workload teams with an extra layer of intrusion detection and prevention. This layer helps block unwanted inbound traffic. To support this topology, this architecture requires more UDR configurations. For more information, see Zero Trust network for web applications with Azure Firewall and Application Gateway.

DNS configuration

In the baseline architecture, all components use Azure DNS directly for DNS resolution.

Change from the baseline: In this architecture, typically one or more DNS servers in the hub perform DNS resolution. When the virtual network is created, the DNS properties on the virtual network should already be set accordingly. The workload team doesn't need to understand the implementation details of the DNS service.

This architecture configures the workload components with DNS in the following ways.

Component DNS configuration
Application Gateway Inherited from virtual network
App Service (chat UI) Inherited from virtual network
AI Search Can't be overridden, uses Azure DNS
Azure AI Foundry Can't be overridden, uses Azure DNS
Foundry Agent Service Can't be overridden, uses Azure DNS
Azure Cosmos DB Can't be overridden, uses Azure DNS
Jump box Inherited from virtual network
Build agents Inherited from virtual network

This architecture doesn't configure DNS settings for components that don't initiate outbound communication. These components don't require DNS resolution.

Many components in this architecture rely on DNS records hosted in the hub's DNS servers to resolve this workload's private endpoints. For more information, see Azure private DNS zones.

When hub-based DNS resolution isn't possible, those components face the following limitations:

  • The platform team can't log DNS requests, which might violate an organizational security team requirement.

  • Resolving to Private Link-exposed services in your landing zone or other application landing zones might be impossible.

We recommend that you familiarize yourself with how the platform team manages DNS. For more information, see Private Link and DNS integration at scale. When you add component features that directly depend on Azure DNS, you might introduce complexities in the platform-provided DNS topology. You can redesign components or negotiate exceptions to minimize complexity.

Egress traffic

In the baseline architecture, all egress traffic routes to the internet through Azure Firewall.

Change from the baseline: In this architecture, the platform provides a UDR that points to an Azure Firewall instance that it hosts. Apply this UDR to the same subnets in the baseline architecture.

All traffic that leaves the spoke virtual network, including traffic from the agent integration subnet, reroutes through the peered hub network via an egress firewall.

Architecture diagram that focuses mostly on network egress flows.

Download a Visio file of this architecture.

East-west client communication to the private endpoints for Key Vault, Azure AI Foundry, and other services remains the same as the baseline architecture. The preceding diagram doesn't include that path.

Route internet traffic to the firewall

All subnets in the spoke network include a route that directs all internet-bound traffic, or 0.0.0.0/0 traffic, to the hub's Azure Firewall instance.

Component Mechanism to force internet traffic through the hub
Application Gateway None. Internet-bound traffic that originates from this service can't be forced through the platform team's firewall.
App Service (chat UI) Regional virtual network integration and the vnetRouteAllEnabled setting are enabled.
AI Search None. Traffic that originates from this service can't be forced through a firewall. This architecture doesn't use skills.
Foundry Agent Service A UDR applied to the snet-agentsEgress subnet.
Jump boxes A UDR applied to the snet-jumpbox subnet.
Build agents A UDR applied to the snet-agents subnet.

This architecture doesn't configure force tunneling for components that don't initiate outbound communication.

For components or features that can't route egress traffic through the hub, your workload team must align with organizational requirements. To meet those requirements, use compensating controls, redesign the workload to exclude unsupported features, or request formal exceptions. You're responsible for mitigating data exfiltration and abuse.

Apply the platform-provided internet route to all subnets, even if you don't expect the subnet to have outgoing traffic. This approach ensures that unexpected deployments in that subnet go through routine egress filtering. For subnets that contain private endpoints, enable network policies to support full routing and NSG control.

This route configuration ensures that all outbound connections from App Service, Azure AI Foundry and its projects, and any other services that originate from the workload's virtual network are inspected and controlled.

Azure private DNS zones

Workloads that use private endpoints for east-west traffic require DNS zone records in the configured DNS provider. To support Private Link, this architecture relies on many DNS zone records for services such as Key Vault, Azure AI Foundry, and Azure Storage.

Change from the baseline: In the baseline architecture, the workload team directly manages the private DNS zones. In this architecture, the platform team typically maintains private DNS zones. The workload team must clearly understand the platform team's requirements and expectations for the management of the private DNS zone records. The platform team can use other technology instead of private DNS zone records.

In this architecture, the platform team must set up DNS for the following Azure AI Foundry FQDN API endpoints:

  • privatelink.services.ai.azure.com
  • privatelink.openai.azure.com
  • privatelink.cognitiveservices.azure.com

The platform team must also set up DNS for the following FQDNs, which are Foundry Agent Service dependencies:

  • privatelink.search.windows.net
  • privatelink.blob.core.windows.net
  • privatelink.documents.azure.com

Important

DNS resolution must function correctly from within the spoke virtual before you deploy the capability host for Foundry Agent Service and during operation of the Foundry Agent Service. The Foundry Agent Service capability doesn't use your spoke virtual network's DNS configuration. Therefore, it's recommended that your platform team configure a ruleset for the workload's private DNS zones in DNS Private Resolver, linking those rules to your application landing zone spoke.

Before you deploy Azure AI Foundry and its agent capability, you must wait until the Foundry Agent Service dependencies are fully resolvable to their private endpoints from within the spoke network. This requirement is especially important if DINE policies handle updates to DNS private zones. If you attempt to deploy the Foundry Agent Service capability before the private DNS records are resolvable from within your subnet, the deployment fails.

The platform team must also host the private DNS zones for other workload dependencies in this architecture:

  • privatelink.vaultcore.azure.net
  • privatelink.azurewebsites.net

Data scientist and agent developer access

Like the baseline architecture, this architecture disables public ingress access to the Azure AI Foundry portal and other browser-based experiences. The baseline architecture deploys a jump box to provide a browser with a source IP address from the virtual network that various workload roles use.

When your workload connects to an Azure landing zone, your team gains more access options. Work with the platform team to see if you can get private access to various browser-based Azure AI Foundry portals without managing and governing a virtual machine (VM). This access might be possible through transitive routing from an existing ExpressRoute or VPN Gateway connection.

Native workstation-based access requires cross-premises routing and DNS resolution, which the platform team can help provide. Include this requirement in your subscription vending request.

Providing native workstation-based access to these portals improves productivity and simplifies maintenance compared to managing VM jump boxes.

The role of the jump box

The jump box in this architecture provides value for operational support, not for runtime purposes or AI and machine learning development. The jump box can troubleshoot DNS and network routing problems because it provides internal network access to otherwise externally inaccessible components.

In the baseline architecture, Azure Bastion accesses the jump box, which you manage.

In this architecture, Azure Bastion is deployed in the connectivity subscription as a shared regional resource that the platform team manages. To demonstrate that use case in this architecture, Azure Bastion is in the connectivity subscription and your team doesn't deploy it.

The VM that serves as the jump box must comply with organizational requirements for VMs. These requirements might include items such as corporate identities in Microsoft Entra ID, specific base images, and patching regimes.

Monitor resources

The Azure landing zone platform provides shared observability resources as part of the management subscription. However, we recommend that you provision your own monitoring resources to facilitate ownership responsibilities of the workload. This approach aligns with the baseline architecture.

You provision the following monitoring resources:

  • Application Insights serves as the application performance management (APM) service for your team. You configure this service in the chat UI, Foundry Agent Service, and models.

  • The Azure Monitor Logs workspace serves as the unified sink for all logs and metrics from workload-owned Azure resources.

Similar to the baseline architecture, all resources must send Azure diagnostics logs to the Azure Monitor Logs workspace that your team provisions. This configuration is part of the infrastructure as code (IaC) deployment of the resources. You might also need to send logs to a central Azure Monitor Logs workspace. In Azure landing zones, that workspace typically resides in the management subscription.

The platform team might have more processes that affect resources in the application landing zone. For example, they might use DINE policies to configure diagnostics and send logs to centralized management subscriptions. They might also apply Azure Monitor baseline alerts through policy. Ensure that your implementation doesn't block these extra logging and alerting flows.

Azure Policy

The baseline architecture recommends general policies to help govern the workload. When you deploy this architecture into an application landing zone, you don't need to add or remove extra policies. To help enforce governance and enhance the security of this workload, continue to apply policies to your subscription, resource groups, or resources.

Expect the application landing zone subscription to have existing policies, even before you deploy the workload. Some policies help organizational governance by auditing or blocking specific configurations in deployments.

The following example policies might lead to workload deployment complexities:

  • Policy: Secrets [in Key Vault] should have the specified maximum validity period.

    Complication: Azure AI Foundry can store secrets related to knowledge and tool connections in a connected Key Vault. Those secrets don't have an expiry date set by the service. The baseline architecture doesn't use these types of connections, but you can extend the architecture to support them.

  • Policy: AI Search services should use customer-managed keys to encrypt data at rest.

    Complication: This architecture doesn't require customer-managed keys. But you can extend the architecture to support them.

  • Policy: AI Foundry models should not be preview.

    Complication: You might be in development using a preview model that you anticipate to be generally available by the time you enable the agent capability in your production workload.

Platform teams might apply DINE policies to handle automated deployments into an application landing zone subscription. Preemptively incorporate and test the platform-initiated restrictions and changes into your IaC templates. If the platform team uses Azure policies that conflict with the requirements of the application, you can negotiate a resolution.

Manage changes over time

In this architecture, platform-provided services and operations serve as external dependencies. The platform team continues to apply changes, onboard landing zones, and apply cost controls. The platform team might not prioritize individual workloads. Changes to those dependencies, such as firewall modifications, can affect multiple workloads.

You must communicate with platform teams in an efficient and timely manner to manage all external dependencies. It's important to test changes beforehand so that they don't negatively affect workloads.

Platform changes that affect the workload

In this architecture, the platform team manages the following resources. Changes to these resources can potentially affect the workload's reliability, security, operations, and performance targets. Evaluate these changes before the platform team implements them to determine how they affect the workload.

  • Azure policies: Changes to Azure policies can affect workload resources and their dependencies. These changes might include direct policy updates or movement of the landing zone into a new management group hierarchy. These changes might go unnoticed until a new deployment occurs, so you must thoroughly test them.

    Example: A policy no longer allows the deployment of Azure Cosmos DB instances that require customer-managed key encryption, and your architecture uses Microsoft-managed key encryption.

  • Firewall rules: Modifications to firewall rules can affect the workload's virtual network or rules that apply broadly across all traffic. These modifications can result in blocked traffic and even silent process failures. These potential problems apply to both the egress firewall and Azure Virtual Network Manager-applied NSG rules.

    Example: Blocked traffic to Bing APIs leads to failed agent tool invocations for internet grounding data.

  • Routing in the hub network: Changes in the transitive nature of routing in the hub can potentially affect the workload functionality if a workload depends on routing to other virtual networks.

    Example: A routing change prevents Azure AI Foundry agents from accessing a vector store that's operated by another team or prevents data science teams from accessing the AI Foundry portal from their workstations.

  • Azure Bastion host: Changes to the Azure Bastion host availability or configuration.

    Example: A configuration change prevents access to jump boxes and build agent VMs.

Workload changes that affect the platform

The following examples describe workload changes that you should communicate to the platform team. The platform team must validate the platform service's reliability, security, operations, and performance targets against your new changes before they go into effect.

  • Network egress: Monitor any significant increase in network egress to prevent the workload from becoming a noisy neighbor on network devices. This problem can potentially affect the performance or reliability targets of other workloads. This architecture is mostly self-contained and is unlikely to experience a significant change in outbound traffic patterns.

  • Public access: Changes to public access for workload components might require extra testing. The platform team might relocate the workload to a different management group.

    Example: In this architecture, if you remove the public IP address from Application Gateway and make this application internal only, the workload's exposure to the internet changes. Another example is exposing the browser-based AI portals to the internet, which we don't recommend.

  • Business criticality rating: Changes to the workload's service-level agreements (SLAs) might require a new collaboration approach between the platform and workload teams.

    Example: Your workload might transition from low to high business critically because of increased adoption and success.

Enterprise architecture team

Some organizations have an enterprise architecture team that might influence the design of your workload and its architecture. That team understands the enterprise's AI adoption strategy and the principles and recommendations in the AI workloads on Azure design. Work with this team to ensure that this chat workload meets scenario-specific objectives and aligns with organizational strategy and recommendations.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.

Reliability

Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.

This architecture maintains the reliability guarantees in the baseline architecture. It doesn't introduce new reliability considerations for the core workload components.

Critical dependencies

Treat all functionality that the workload performs in the platform and application landing zone as dependencies. Maintain incident response plans that include contact methods and escalation paths for each dependency. Include these dependencies in the workload's failure mode analysis (FMA).

Consider the following workload dependencies and example scenarios that might occur:

  • Egress firewall: The centralized egress firewall undergoes changes unrelated to the workload. Multiple workloads share the firewall.

  • DNS resolution: This architecture uses DNS provided by the platform team for most resources, combined with Azure DNS and linked DNS Private Resolver rulesets for Foundry Agent Service. As a result, the workload depends on timely updates to DNS records for private endpoints and availability of DNS services.

  • DINE policies: DINE policies for Azure private DNS zones, or any other platform-provided dependency, operate on a best-effort basis and don't include an SLA. For example, a delay in DNS configuration for this architecture's private endpoints can prevent the chat UI from becoming traffic-ready or block agents from completing Foundry Agent Service queries.

  • Management group policies: Consistent policies among environments support reliability. Ensure that preproduction environments match production environments to provide accurate testing and prevent environment-specific deviations that can block a deployment or scale. For more information, see Manage application development environments in Azure landing zones.

Many of these considerations might exist without Azure landing zones. You need to work with platform teams collaboratively to address these problems and ensure that you meet all requirements. For more information, see Identify dependencies.

Security

Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.

Ingress traffic control

To isolate your workload from other workload spokes within your organization, apply NSGs on your subnets and use the nontransitive nature or controls in the regional hub. Construct comprehensive NSGs that only permit the inbound network requirements of your application and its infrastructure. We recommend that you don't solely rely on the nontransitive nature of the hub network for security.

The platform team implements Azure policies for security. For example, a policy might ensure that Application Gateway has a web application firewall set to deny mode, which restricts the number of public IP addresses available to your subscription. In addition to those policies, you should deploy more workload-centric policies that reinforce the ingress security posture.

The following table shows examples of ingress controls in this architecture.

Source Purpose Workload control Platform control
Internet Application traffic flows Funnel all workload requests through an NSG, a web application firewall, and routing rules before allowing public traffic to transition to private traffic for the chat UI. None
Internet Azure AI Foundry portal access and data plane REST API access Deny all access through service-level configuration. None
Internet Data plane access to all services except Application Gateway Deny all access through NSG rules and service-level configuration. None
Azure Bastion Jump box and build agent access Apply an NSG on the jump box subnet that blocks all traffic to remote access ports, unless the source is the platform's designated Azure Bastion subnet. None
Cross-premises Azure AI Foundry portal access and data plane REST API access Deny all access. If you don't use the jump box, allow access only from workstations in authorized subnets for data scientists. Enforce nontransitive routing or use Azure Firewall in an Azure Virtual WAN secured hub
Other spokes None Blocked via NSG rules. Enforce nontransitive routing or use Azure Firewall in a Virtual WAN secured hub

Egress traffic control

Apply NSG rules that express the required outbound connectivity requirements of your solution and deny everything else. Don't rely only on the hub network controls. As a workload operator, you must stop undesired egress traffic as close to the source as possible.

You own your workload's subnets within the virtual network, but the platform team likely created firewall rules to specifically represent your captured requirements as part of your subscription vending process. Ensure that changes in subnets and resource placement over the lifetime of your architecture remain compatible with your original request. Work with your network team to ensure continuity of least-access egress control.

The following table shows examples of egress controls in this architecture.

Endpoint Purpose Workload control Platform control
Public internet sources Your agent might require internet search to ground an Azure OpenAI in Foundry Models request Apply NSGs on the agent integration subnet Apply firewall application rules for external knowledge stores and tools
Azure AI Foundry data plane The chat UI interacts with the chat agent Allow TCP/443 from the App Service integration subnet to the Azure AI Foundry private endpoint subnet None
Azure Cosmos DB To access the memory database from Foundry Agent Service Allow TCP on every port to the Azure Cosmos DB private endpoint subnet None

For traffic that leaves the workload's virtual network, this architecture applies controls at the workload level via NSGs and at the platform level via a hub network firewall. The NSGs provide initial, broad network traffic rules. In the platform's hub, the firewall applies more specific rules for added security.

This architecture doesn't require east-west traffic between internal components, such as Foundry Agent Service and its dependent AI Search instance, to route through the hub network.

DDoS Protection

Determine who should apply the DDoS Protection plan that covers your solution's public IP addresses. The platform team might use IP address protection plans, or they might use Azure Policy to enforce virtual network protection plans. This architecture requires DDoS Protection coverage because it has a public IP address for ingress from the internet. For more information, see Recommendations for networking and connectivity.

Identity and access management

Unless the platform team's governance automation requires extra controls, this architecture doesn't introduce new authorization requirements because of the platform team's involvement. Azure role-based access control (RBAC) should continue to fulfill the principle of least privilege, which grants limited access only to individuals who need it and only when needed. For more information, see Recommendations for identity and access management.

Certificates and encryption

Your team typically procures the TLS certificate for the public IP address on Application Gateway. Work with the platform team to understand how the certificate procurement and management processes should align with organizational expectations.

All data storage services in this architecture support Microsoft-managed or customer-managed encryption keys. Use customer-managed encryption keys if your workload design or organization requires more control. Azure landing zones themselves don't mandate a specific method.

Cost Optimization

Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.

All cost optimization strategies in the baseline architecture apply to the workload resources in this architecture.

This architecture greatly benefits from Azure landing zone platform resources. For example, resources such as Azure Firewall and DDoS Protection transition from workload to platform resources. Even if you use those resources through a chargeback model, the added security and cross-premises connectivity are more cost-effective than self-managing those resources. Take advantage of other centralized offerings from your platform team to extend those benefits to your workload without compromising its service-level objective, recovery time objective, or recovery point objective.

Important

Don't try to optimize costs by consolidating Azure AI Foundry dependencies as platform resources. These services must remain workload resources.

Operational Excellence

Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Design review checklist for Operational Excellence.

You remain responsible for all operational excellence considerations from the baseline architecture. These responsibilities include monitoring, GenAIOps, quality assurance, and safe deployment practices.

Correlate data from multiple sinks

The workload's Azure Monitor Logs workspace stores the workload's logs and metrics and its infrastructure components. However, a central Azure Monitor Logs workspace often stores logs and metrics from centralized services, such as Azure Firewall, DNS Private Resolver, and Azure Bastion. Correlating data from multiple sinks can be a complex task.

Correlated data helps support incident response. The triage runbook for this architecture should address this situation and include organizational contact information if the problem extends beyond workload resources. Workload administrators might require assistance from platform administrators to correlate log entries from enterprise networking, security, or other platform services.

Important

For the platform team: When possible, grant RBAC permissions to query and read log sinks for relevant platform resources. Enable firewall logs for network and application rule evaluations and DNS proxy. The application teams can use this information to troubleshoot tasks. For more information, see Recommendations for monitoring and threat detection.

Build agents

Many services in this architecture use private endpoints. Similar to the baseline architecture, this design might require build agents. Your team deploys the build agents safely and reliably. The platform team isn't involved in this process.

Make sure that the build agent management complies with organizational standards. These standards might include the use of platform-approved operating system images, patching schedules, compliance reporting, and user authentication methods.

Performance Efficiency

Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.

The performance efficiency considerations in the baseline architecture also apply to this architecture. Your team retains control over the resources in the application flows, not the platform team. Scale the chat UI host, language models, and other components according to the workload and cost constraints. Depending on the final implementation of your architecture, consider the following factors when you measure your performance against performance targets:

  • Egress and cross-premises latency
  • SKU limitations from cost containment governance

Deploy this scenario

Deploy a landing zone implementation of this reference architecture.

Contributors

Microsoft maintains this article. The following contributors wrote this article.

Principal authors:

  • Bilal Amjad | Microsoft Cloud Solution Architect
  • Freddy Ayala | Microsoft Cloud Solution Architect
  • Chad Kittel | Principal Software Engineer - Azure Patterns & Practices

To see nonpublic LinkedIn profiles, sign in to LinkedIn.

Next step

Learn how to collaborate on technical details with the platform team.