Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article provides the configuration parameter reference, environment variables, and troubleshooting guidance for deploying Agentic Retrieval.
Important
Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
Configuration parameters
The following configuration parameters are used when you install the Agentic Retrieval extension:
| Parameter | Required | Description |
|---|---|---|
byom.enabled |
Yes | Always true. BYOM is the only language model path. |
byom.apiEndpoint |
Yes | Full endpoint URL. For Foundry Local: https://gpt-oss-20b.foundry-local-operator.svc.cluster.local:5000/v1/chat/completions. For Microsoft Foundry: https://<resource>.cognitiveservices.azure.com/openai/deployments/<model>/chat/completions?api-version=<version>. |
byom.apiModel |
Conditional | Not required for Foundry Local endpoints. Model name to send in requests (for example, gpt-oss-20b). |
byom.maxTokensInK |
Yes | Maximum tokens in thousands (for example, 16). |
foundryClientId |
Conditional | Required only when using a Foundry Local model source with useFoundryLocal=true. Not required for non-Foundry Local endpoints. |
auth.tenantId |
Yes | Microsoft Entra ID tenant ID. |
auth.clientId |
Yes | Agents and Tools app registration client ID. |
isManagedIdentityRequired |
Yes | Always true. Enables managed identity token acquisition. |
layerSelection |
Yes | combined, agentic, or knowledge. |
ingress.domainname |
Yes | Full DNS name for external access (for example, mycluster.eastus.cloudapp.azure.com). |
gpu_enabled |
No | Set to true for GPU clusters. Enables GPU-accelerated embedding models. |
min_gpu_nodes |
No | Minimum GPU nodes required. Default: 2. |
AgentOperationTimeoutInMinutes |
No | Timeout for agent operations. Default: 30. |
model |
No | Always byom. No other option. |
llm.dapr.accessControl.defaultAction |
No | Dapr access control. Set to allow. |
embeddingmodel.image.gpu.repository |
No | GPU embedding model image repository. |
embeddingmodel.image.gpu.tag |
No | GPU embedding model image tag. |
The BYOM API key is not passed as a configuration parameter. It's stored as a Kubernetes secret (byom-api-key) in the arc-rag namespace before extension installation.
The Azure CLI accepts both --config and --configuration-settings for Arc extension parameters. Both syntaxes are equivalent.
Environment variables
Helm templates populate the following environment variables for all inferencing pods:
| Variable | Source |
|---|---|
BYOM_ENABLED |
Always true |
BYOM_ENDPOINT |
byom.apiEndpoint |
BYOM_MODEL |
byom.apiModel |
BYOM_API_KEY |
byom.apiKey |
FOUNDRY_CLIENT_ID |
foundryClientId (when configured) |
Troubleshoot Foundry Local integration
Use the following commands to diagnose Foundry Local issues:
| Command | Purpose |
|---|---|
kubectl describe mdep <name> |
Check ModelDeployment status and events. |
kubectl logs -f deployment/inference-operator -n foundry-local-operator |
Check operator logs. |
kubectl get pods -l app.kubernetes.io/managed-by=inference-operator |
Check inference pod status. |
kubectl describe pod <pod_name> |
Get pod details and events. |
kubectl get deploy,svc,ing -l foundry.azure.com/deployment=<name> |
List all resources created by a deployment. |
kubectl get configmap foundry-local-catalog -n foundry-local-operator -o yaml |
Check the model catalog ConfigMap. |
Common integration issues
| Symptom | Cause | Resolution |
|---|---|---|
| Connection refused or timeout | Foundry Local not running or network policy blocking egress. | Verify Foundry pods are running. Ensure egress from arc-rag namespace to Foundry ingress is allowed. |
SSL: CERTIFICATE_VERIFY_FAILED |
foundryClientId not set in extension configuration. |
Set foundryClientId — this gates the CA bundle mount. Without it, the Foundry self-signed certificate isn't trusted. |
400 Bad Request: plain HTTP sent to HTTPS port |
Using http:// instead of https:// in byom.apiEndpoint. |
Change the endpoint to https://. Foundry Local enables TLS by default. |
400 Invalid JSON in request body |
Using onnx-genai runtime with agentic or combined mode. |
Switch to vllm runtime. The onnx-genai runtime doesn't support tools or tool_choice parameters. |
401 Token validation failed |
RBAC roles not assigned for the managed identity. | Assign Reader + Cognitive Services OpenAI User + FoundryInferenceAccess app role. See Configure Foundry Local inference authentication. |
401 Entra ID authentication is not enabled |
Sending managed identity token to Foundry with entraAuth.enabled=false. |
Either enable Microsoft Entra authentication on Foundry, or clear FOUNDRY_CLIENT_ID so Agents and Tools uses API key authentication. |
401 Invalid API key |
API key rotated after model redeployment. | Re-read the key from the gpt-oss-20b-api-keys secret and update byom-api-key in the arc-rag namespace. |
404 Not Found from Foundry |
Model not deployed. | Run kubectl get mdep -n foundry-local-operator and verify the model name matches byom.apiModel. |
| LLM calls fail but embedding and ingestion work | Expected behavior. Embedding models are local; only LLM inference uses Foundry. | Check Foundry connectivity and model deployment status. |
| Managed identity token acquisition fails | Microsoft Entra ID unreachable or msi-adapter not running. | Check msi-adapter sidecar logs. The request falls back to API key authentication only. |
| Pods pending (insufficient resources) | Cluster too small for combined mode (60+ pods). | Combined mode requires at least 3x Standard_D8s_v3 (24 vCPU, 96 GB RAM) worker nodes + 1 GPU node. Scale node pools with az aksarc nodepool scale. |
| Extension install: stale nginx webhook | Previous install left ValidatingWebhookConfiguration. |
Run kubectl delete validatingwebhookconfiguration ingress-nginx-admission before reinstalling. |
Foundry Local operator parameters
You can set these optional parameters during Foundry inference operator installation:
| Parameter | Description |
|---|---|
entraAuth.enabled |
When enabled, Microsoft Entra Auth SDK and msi-adapter sidecars are injected into inference pods for JWT validation and ARM RBAC authorization. When disabled, tenantId and clientId are optional. Default: true. |
watch.namespaces |
Configure if the operator should manage resources across multiple namespaces. Default: foundry-local-operator. Pass as: --config watch.namespaces[0]="<namespace_1>" --config watch.namespaces[1]="<namespace_2>". |
Foundry Local key management
You can retrieve and rotate API keys by using the Foundry Local inference service:
| Endpoint | Description |
|---|---|
GET /namespaces/<namespace>/deployments/<name>/keys |
Retrieve both primary and secondary keys. |
POST /namespaces/<namespace>/deployments/<name>/keys/{primary\|secondary}/rotate |
Rotate a specific key. |