Deployment parameter reference for Agentic Retrieval in Foundry Local

This article provides the configuration parameter reference, environment variables, and troubleshooting guidance for deploying Agentic Retrieval.

Important

Agentic Retrieval in Foundry Local is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Configuration parameters

The following configuration parameters are used when you install the Agentic Retrieval extension:

Parameter	Required	Description
`byom.enabled`	Yes	Always `true`. BYOM is the only language model path.
`byom.apiEndpoint`	Yes	Full endpoint URL. For Foundry Local: `https://gpt-oss-20b.foundry-local-operator.svc.cluster.local:5000/v1/chat/completions`. For Microsoft Foundry: `https://<resource>.cognitiveservices.azure.com/openai/deployments/<model>/chat/completions?api-version=<version>`.
`byom.apiModel`	Conditional	Not required for Foundry Local endpoints. Model name to send in requests (for example, `gpt-oss-20b`).
`byom.maxTokensInK`	Yes	Maximum tokens in thousands (for example, `16`).
`foundryClientId`	Conditional	Required only when using a Foundry Local model source with `useFoundryLocal=true`. Not required for non-Foundry Local endpoints.
`auth.tenantId`	Yes	Microsoft Entra ID tenant ID.
`auth.clientId`	Yes	Agents and Tools app registration client ID.
`isManagedIdentityRequired`	Yes	Always `true`. Enables managed identity token acquisition.
`layerSelection`	Yes	`combined`, `agentic`, or `knowledge`.
`ingress.domainname`	Yes	Full DNS name for external access (for example, `mycluster.eastus.cloudapp.azure.com`).
`gpu_enabled`	No	Set to `true` for GPU clusters. Enables GPU-accelerated embedding models.
`min_gpu_nodes`	No	Minimum GPU nodes required. Default: `2`.
`AgentOperationTimeoutInMinutes`	No	Timeout for agent operations. Default: `30`.
`model`	No	Always `byom`. No other option.
`llm.dapr.accessControl.defaultAction`	No	Dapr access control. Set to `allow`.
`embeddingmodel.image.gpu.repository`	No	GPU embedding model image repository.
`embeddingmodel.image.gpu.tag`	No	GPU embedding model image tag.

The BYOM API key is not passed as a configuration parameter. It's stored as a Kubernetes secret (byom-api-key) in the arc-rag namespace before extension installation.

The Azure CLI accepts both --config and --configuration-settings for Arc extension parameters. Both syntaxes are equivalent.

Environment variables

Helm templates populate the following environment variables for all inferencing pods:

Variable	Source
`BYOM_ENABLED`	Always `true`
`BYOM_ENDPOINT`	`byom.apiEndpoint`
`BYOM_MODEL`	`byom.apiModel`
`BYOM_API_KEY`	`byom.apiKey`
`FOUNDRY_CLIENT_ID`	`foundryClientId` (when configured)

Troubleshoot Foundry Local integration

Use the following commands to diagnose Foundry Local issues:

Command	Purpose
`kubectl describe mdep <name>`	Check ModelDeployment status and events.
`kubectl logs -f deployment/inference-operator -n foundry-local-operator`	Check operator logs.
`kubectl get pods -l app.kubernetes.io/managed-by=inference-operator`	Check inference pod status.
`kubectl describe pod <pod_name>`	Get pod details and events.
`kubectl get deploy,svc,ing -l foundry.azure.com/deployment=<name>`	List all resources created by a deployment.
`kubectl get configmap foundry-local-catalog -n foundry-local-operator -o yaml`	Check the model catalog ConfigMap.

Common integration issues

Symptom	Cause	Resolution
Connection refused or timeout	Foundry Local not running or network policy blocking egress.	Verify Foundry pods are running. Ensure egress from `arc-rag` namespace to Foundry ingress is allowed.
`SSL: CERTIFICATE_VERIFY_FAILED`	`foundryClientId` not set in extension configuration.	Set `foundryClientId` — this gates the CA bundle mount. Without it, the Foundry self-signed certificate isn't trusted.
`400 Bad Request: plain HTTP sent to HTTPS port`	Using `http://` instead of `https://` in `byom.apiEndpoint`.	Change the endpoint to `https://`. Foundry Local enables TLS by default.
`400 Invalid JSON in request body`	Using `onnx-genai` runtime with agentic or combined mode.	Switch to `vllm` runtime. The `onnx-genai` runtime doesn't support `tools` or `tool_choice` parameters.
`401 Token validation failed`	RBAC roles not assigned for the managed identity.	Assign `Reader` + `Cognitive Services OpenAI User` + `FoundryInferenceAccess` app role. See Configure Foundry Local inference authentication.
`401 Entra ID authentication is not enabled`	Sending managed identity token to Foundry with `entraAuth.enabled=false`.	Either enable Microsoft Entra authentication on Foundry, or clear `FOUNDRY_CLIENT_ID` so Agents and Tools uses API key authentication.
`401 Invalid API key`	API key rotated after model redeployment.	Re-read the key from the `gpt-oss-20b-api-keys` secret and update `byom-api-key` in the `arc-rag` namespace.
`404 Not Found` from Foundry	Model not deployed.	Run `kubectl get mdep -n foundry-local-operator` and verify the model name matches `byom.apiModel`.
LLM calls fail but embedding and ingestion work	Expected behavior. Embedding models are local; only LLM inference uses Foundry.	Check Foundry connectivity and model deployment status.
Managed identity token acquisition fails	Microsoft Entra ID unreachable or msi-adapter not running.	Check msi-adapter sidecar logs. The request falls back to API key authentication only.
Pods pending (insufficient resources)	Cluster too small for combined mode (60+ pods).	Combined mode requires at least 3x Standard_D8s_v3 (24 vCPU, 96 GB RAM) worker nodes + 1 GPU node. Scale node pools with `az aksarc nodepool scale`.
Extension install: stale nginx webhook	Previous install left `ValidatingWebhookConfiguration`.	Run `kubectl delete validatingwebhookconfiguration ingress-nginx-admission` before reinstalling.

Foundry Local operator parameters

You can set these optional parameters during Foundry inference operator installation:

Parameter	Description
`entraAuth.enabled`	When enabled, Microsoft Entra Auth SDK and msi-adapter sidecars are injected into inference pods for JWT validation and ARM RBAC authorization. When disabled, `tenantId` and `clientId` are optional. Default: `true`.
`watch.namespaces`	Configure if the operator should manage resources across multiple namespaces. Default: `foundry-local-operator`. Pass as: `--config watch.namespaces[0]="<namespace_1>" --config watch.namespaces[1]="<namespace_2>"`.

Foundry Local key management

You can retrieve and rotate API keys by using the Foundry Local inference service:

Endpoint	Description
`GET /namespaces/<namespace>/deployments/<name>/keys`	Retrieve both primary and secondary keys.
`POST /namespaces/<namespace>/deployments/<name>/keys/{primary\\|secondary}/rotate`	Rotate a specific key.

Feedback

Was this page helpful?

Last updated on 2026-06-02