Edit

Prepare AKS cluster on Azure Local for Agentic Retrieval in Foundry Local

For your Agentic Retrieval deployment, prepare an AKS cluster on Azure Local by creating the cluster, configuring node pools, and installing GPU drivers as needed. This article is part of the deployment prerequisites checklist.

Important

Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prepare your AKS cluster

Create an Azure Kubernetes Service (AKS) cluster on the Azure Local instance with a node pool that meets the minimum requirements.

Create an AKS Arc cluster

Create an AKS Arc cluster by using one of the following methods:

Install supported GPU drivers (optional)

AKS Arc supports only Nvidia A2 and A16 GPUs. The following steps apply only to these two GPUs.

If you have GPUs available in your Azure Local instance that you want to use for Agentic Retrieval, make sure that the necessary GPU drivers are installed and available in the AKS Arc cluster nodes.

To check if the right drivers are already installed and the GPUs are available to the AKS Arc cluster, run the following command:

(Get-MocNode -location MocLocation).properties.statuses.Info`

If the output lists all the GPUs available on the Azure Local cluster, you can move to the next step. Otherwise, complete the following steps on any of the Azure Local cluster nodes to enable GPUs.

Use the script in the Enabling GPU on AKS on Azure ARC - sample to enable GPUs for use by Agentic Retrieval.

Alternatively, follow the instructions in Use GPUs for compute-intensive workloads and ensure you meet the minimum VM hardware requirements for the GPU mode. If you follow these instructions, run the following command on each Hyper-V host in the Azure Local cluster:

Restart-Service wssdagent -Force -Verbose 
Start-sleep 60
(Get-MocNode -location MocLocation).properties.statuses.Info

Make sure that the output of the command lists all the available GPUs across all nodes.

Configure machine to manage Azure Arc-enabled Kubernetes clusters (optional)

If you want to manage the Kubernetes clusters from a machine outside the Azure Local instance, set up a driver machine (local management host) with the following tools:

  • Azure CLI
  • Azure CLI extensions aksarc and Kubernetes-extension
  • kubectl
  • Helm

This driver machine must be able to connect to the Kubernetes cluster on the network.

To set up a Windows machine to manage your Kubernetes clusters, see the Script to configure machine to manage Azure Arc-enabled Kubernetes cluster.

Create node pools for AKS Arc cluster

To create a node pool for AKS Arc, complete the following steps from the driver machine.

  1. Sign into Azure by using Azure CLI: az login.

  2. Create the node pool.

    If GPUs are available:

    • You must create a node pool of at least three CPU virtual machines (VMs), with a minimum size of Standard_D8s_v3. Run the following command:
    	$cpuPoolName = "<CPU Pool Name>"
    	$gpuPoolName = "<GPU Pool Name>"
    	$gpuVmSku = "Standard_NC8_A2" #Can also use Standard_NC8_A16
    	$cpuVmSku = "Standard_D8s_v3"
    	$rg = "<Resource Group name>"
    	$cpuNodeCount = 3
    	$gpuNodeCount = 2
    			
    	az aksarc nodepool add --name $cpuPoolName --cluster-name $k8scluster -g $rg --node-count $cpuNodeCount --node-vm-size $cpuVmSku
  • You must create a node pool of at least two GPU virtual machines (VMs), with a minimum size of Standard_NC8_A2 or Standard_NC8_A16. Run the following command:

The two GPU VMs are used for the Knowledge Layer's text embedding model (BGE-M3) and image embedding model (CLIP ViT-L/14). Docling (document parser) runs on CPU. The language model (LLM) runs externally via your BYOM endpoint and doesn't consume cluster GPUs.

az aksarc nodepool add --name $gpuPoolName --cluster-name $k8scluster -g $rg --node-count $gpuNodeCount  --node-vm-size $gpuVmSku

If only CPUs are available, you must create a node pool of at least six CPU VMs, with a minimum size of Standard_D8s_v3. Run the following command:

$cpuPoolName = "<CPU Pool Name>"
$cpuVmSku = "Standard_D8s_v3"
$rg = "<Resource Group name>"
$cpuNodeCount = 6
$k8scluster = "<AKS Arc Cluster>"
az aksarc nodepool add --name $cpuPoolName --cluster-name $k8scluster -g $rg --node-count $cpuNodeCount --node-vm-size $cpuVmSku

CPU-only mode applies to the Knowledge Layer only. In CPU-only mode, embedding quality might be reduced and image search isn't available. Docling (document parser) runs on CPU by default and doesn't require GPU. CPU-only mode isn't applicable to agentic deployments, which have no GPU requirements regardless.

Node pool requirements by deployment mode

Deployment mode GPU node pool CPU node pool
combined (default) 2 GPU VMs required 3+ CPU VMs required
agentic Not required (no embedding or parsing) 3+ CPU VMs required
knowledge 2 GPU VMs required 3+ CPU VMs required

If you deploy in agentic mode, you can skip the GPU node pool creation step.

For more information, see Create node pools for a cluster in Azure Kubernetes Service (AKS).

Next step