Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article is a reference for the ModelDeployment CRD spec and status fields, the Model CRD spec and status fields, and the inference operator configuration settings.
Important
- Foundry Local is available in preview. Preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities before general availability (GA).
ModelDeployment spec fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
displayName |
string | No | — | Human-readable deployment name. |
model |
object | Yes | — | Model reference. Set one of: ref, catalog, or custom. |
model.ref |
string | Conditional | — | Name of an existing Model CR to reference. |
model.catalog.name |
string | Conditional | — | Catalog model name. |
model.catalog.version |
string | No | latest | Catalog model version. |
model.custom |
object | Conditional | — | Inline custom model definition. |
workloadType |
string | Yes | — | generative or predictive. |
compute |
string | Yes | — | cpu or gpu. |
replicas |
integer | No | 1 | Number of pod replicas (1–100). |
port |
integer | No | 8080 | Container port (1024–65535). |
resources.requests.cpu |
string | No | 100m |
CPU request. |
resources.requests.memory |
string | No | 256Mi |
Memory request. |
resources.limits.cpu |
string | No | 1000m |
CPU limit. |
resources.limits.memory |
string | No | 1Gi |
Memory limit. |
resources.limits.gpu |
integer | No | — | Number of GPUs (0–8). |
runtime |
string | No | onnx-genai | Inference runtime: onnx-genai or vllm. vLLM requires compute: gpu. |
vllm |
object | No | - | vLLM-specific configuration. Only used when runtime: vllm. |
vllm.preferences |
object | No | - | vLLM engine argument overrides (open schema). See vLLM planner documentation |
Vllm.modelCacheStorageGi |
integer | No | 100 | Size of the model cache PVC in GiB (minimum 1). |
nodeSelector |
object | No | — | Node selector labels for pod scheduling. |
skipGpuResource |
boolean | No | false |
Skip the nvidia.com/gpu limit. Requires nodeSelector when set to true. |
tolerations |
array | No | — | Pod tolerations. |
env |
array | No | — | Environment variables for the model container. |
endpoint.enabled |
boolean | No | false |
Create an Ingress resource for this deployment. |
endpoint.host |
string | Conditional | — | Ingress hostname. Required when endpoint.enabled: true. |
endpoint.path |
string | No | Derived from deployment name | URL path for ingress routing. |
endpoint.pathType |
string | No | ImplementationSpecific |
Ingress path matching type. |
endpoint.ingressClassName |
string | No | nginx |
IngressClass name. |
endpoint.annotations |
object | No | — | Custom Ingress annotations. |
endpoint.tls.enabled |
boolean | No | false |
Enable TLS on the Ingress resource. |
endpoint.tls.secretName |
string | Conditional | — | Name of the TLS secret. Required when endpoint.tls.enabled: true. |
External endpoint examples
Minimal ingress configuration:
spec:
endpoint:
enabled: true
ingressClassName: nginx
tls:
enabled: false
The operator automatically derives the path as /{deployment-name}(/|$)(.*) with URL rewriting.
With custom annotations:
spec:
endpoint:
enabled: true
ingressClassName: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
GPU with skip GPU resource:
spec:
compute: gpu
skipGpuResource: true
nodeSelector:
kubernetes.io/gpu-partition: "1g.5gb"
Note
To use skipGpuResource: true, set nodeSelector.
GPU with node selector and tolerations:
spec:
compute: gpu
nodeSelector:
accelerator: nvidia-a100
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
ModelDeployment status fields
| Field | Type | Description |
|---|---|---|
state |
string | Deployment state: Pending, Creating, Running, Updating, Error, or Terminating. |
message |
string | Human-readable status message. |
replicas.desired |
integer | Desired number of replicas. |
replicas.ready |
integer | Number of ready replicas. |
replicas.available |
integer | Number of available replicas. |
readyReplicas |
integer | Deprecated. Use replicats.ready instead. |
deploymentReady |
boolean | true when all replicas are ready. |
serviceReady |
boolean | true when the Service resource is created. |
internalEndpoint |
string | Internal cluster endpoint URL. |
endpointReady |
boolean | true when the Ingress is ready (if enabled). |
externalEndpoint |
string | External URL (populated if Ingress is enabled). |
resolvedModel.name |
string | Name of the resolved Model CR. |
resolvedModel.variant |
string | Selected variant ID. |
resolvedModel.image |
string | Container image used for this deployment. |
authentication.keysSecretName |
string | Name of the secret containing API keys. |
conditions |
array | Detailed status conditions. |
lastUpdated |
datetime | Timestamp of last status update. |
Model CRD spec fields
The Model CRD is for BYO (custom) models only. Catalog models are resolved from the catalog ConfigMap and do not use this CRD. To deploy a catalog model, use model.catalog in the ModelDeployment spec instead.
| Field | Type | Required | Description |
|---|---|---|---|
displayName |
string | No | Human-readable model name. |
description |
string | No | Model description. |
publisher |
string | No | Model publisher. |
license |
string | No | License identifier. |
licenseUrl |
string | No | URL to the full license text. |
source |
object | Yes | Model source configuration. |
source.type |
string | Yes | catalog or custom. |
source.catalog.alias |
string | Conditional | Catalog model alias. Required when source.type: catalog. |
source.catalog.modelId |
string | Conditional | Full catalog model ID. |
source.custom.registry |
string | Conditional | OCI registry URL. Required when source.type: custom. |
source.custom.repository |
string | Conditional | Repository path in the registry. |
source.custom.tag |
string | Conditional | Image tag. |
source.custom.credentials.secretRef.name |
string | Conditional | Name of the Kubernetes secret with registry credentials. |
variants |
array | No | Hardware-specific variant overrides. |
requirements |
object | No | Resource requirements. |
capabilities |
object | No | Model capabilities. |
Model CRD status fields
| Field | Type | Description |
|---|---|---|
phase |
string | Model phase: Pending, Available, or Error. |
message |
string | Human-readable status message. |
catalogSync.lastSynced |
datetime | Timestamp of the last catalog sync. |
catalogSync.syncStatus |
string | Syncing, Synced, or Error. |
conditions |
array | Detailed status conditions. |
lastUpdated |
datetime | Timestamp of last status update. |
Inference operator configuration
The inference operator reads its configuration from a ConfigMap mounted at /etc/inference-operator/config.yaml.
Configuration file example
# Container registry for inference images
registry: "myregistry.azurecr.io"
# Container images for different workload types
images:
generative_cpu:
repository: generative-cpu
tag: "latest"
generative_gpu:
repository: generative-gpu
tag: "latest"
predictive_cpu_oras:
repository: predictive-cpu-byo
tag: "latest"
predictive_gpu_oras:
repository: predictive-gpu-byo
tag: "latest"
Vllm_gpu:
repository: vllm-server
tag: "latest"
# Ingress defaults
ingress:
pathTemplate: "/{name}(/|$)(.*)"
rewritePathTemplate: "/$2"
pathType: "ImplementationSpecific"
ingressClassName: "nginx"
# Catalog settings
catalog:
configmapName: "foundry-local-catalog"
configmapNamespace: "foundry-local-operator"
Configuration fields
| Field | Type | Default | Description |
|---|---|---|---|
registry |
string | "" |
Container registry prefix for inference images. |
images.<type>.repository |
string | Varies by workload type | Image repository path. Types: generative_cpu, generative_gpu, generative_cpu_oras, generative_gpu_oras, predictive_cpu_oras, predictive_gpu_oras, vllm_gpu. |
images.<type>.tag |
string | latest |
Image tag. |
ingress.pathTemplate |
string | /{name}(/\|$)(.*) |
Ingress path template. {name} is replaced with the deployment name. |
ingress.rewritePathTemplate |
string | /$2 |
Rewrite target path for NGINX. |
ingress.pathType |
string | ImplementationSpecific |
Ingress path matching type. |
ingress.ingressClassName |
string | nginx |
Default IngressClass name. |
catalog.configmapName |
string | foundry-local-catalog |
Name of the catalog ConfigMap. |
catalog.configmapNamespace |
string | foundry-local-operator |
Namespace of the catalog ConfigMap. |
catalog.lazyRegistrationEnabled |
boolean | true |
Automatically create Model CRs from catalog on first deployment. |