ModelDeployment and operator configuration reference for Foundry Local

This article is a reference for the ModelDeployment CRD spec and status fields, the Model CRD spec and status fields, and the inference operator configuration settings.

Important

  • Foundry Local is available in preview. Preview releases provide early access to features that are in active deployment.
  • Features, approaches, and processes can change or have limited capabilities before general availability (GA).

ModelDeployment spec fields

Field Type Required Default Description
displayName string No Human-readable deployment name.
model object Yes Model reference. Set one of: ref, catalog, or custom.
model.ref string Conditional Name of an existing Model CR to reference.
model.catalog.name string Conditional Catalog model name.
model.catalog.version string No latest Catalog model version.
model.custom object Conditional Inline custom model definition.
workloadType string Yes generative or predictive.
compute string Yes cpu or gpu.
replicas integer No 1 Number of pod replicas (1–100).
port integer No 8080 Container port (1024–65535).
resources.requests.cpu string No 100m CPU request.
resources.requests.memory string No 256Mi Memory request.
resources.limits.cpu string No 1000m CPU limit.
resources.limits.memory string No 1Gi Memory limit.
resources.limits.gpu integer No Number of GPUs (0–8).
runtime string No onnx-genai Inference runtime: onnx-genai or vllm. vLLM requires compute: gpu.
vllm object No - vLLM-specific configuration. Only used when runtime: vllm.
vllm.preferences object No - vLLM engine argument overrides (open schema). See vLLM planner documentation
Vllm.modelCacheStorageGi integer No 100 Size of the model cache PVC in GiB (minimum 1).
nodeSelector object No Node selector labels for pod scheduling.
skipGpuResource boolean No false Skip the nvidia.com/gpu limit. Requires nodeSelector when set to true.
tolerations array No Pod tolerations.
env array No Environment variables for the model container.
endpoint.enabled boolean No false Create an Ingress resource for this deployment.
endpoint.host string Conditional Ingress hostname. Required when endpoint.enabled: true.
endpoint.path string No Derived from deployment name URL path for ingress routing.
endpoint.pathType string No ImplementationSpecific Ingress path matching type.
endpoint.ingressClassName string No nginx IngressClass name.
endpoint.annotations object No Custom Ingress annotations.
endpoint.tls.enabled boolean No false Enable TLS on the Ingress resource.
endpoint.tls.secretName string Conditional Name of the TLS secret. Required when endpoint.tls.enabled: true.

External endpoint examples

Minimal ingress configuration:

spec:
  endpoint:
    enabled: true
    ingressClassName: nginx
    tls:
      enabled: false

The operator automatically derives the path as /{deployment-name}(/|$)(.*) with URL rewriting.

With custom annotations:

spec:
  endpoint:
    enabled: true
    ingressClassName: nginx
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: "50m"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "600"

GPU with skip GPU resource:

spec:
  compute: gpu
  skipGpuResource: true
  nodeSelector:
    kubernetes.io/gpu-partition: "1g.5gb"

Note

To use skipGpuResource: true, set nodeSelector.

GPU with node selector and tolerations:

spec:
  compute: gpu
  nodeSelector:
    accelerator: nvidia-a100
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

ModelDeployment status fields

Field Type Description
state string Deployment state: Pending, Creating, Running, Updating, Error, or Terminating.
message string Human-readable status message.
replicas.desired integer Desired number of replicas.
replicas.ready integer Number of ready replicas.
replicas.available integer Number of available replicas.
readyReplicas integer Deprecated. Use replicats.ready instead.
deploymentReady boolean true when all replicas are ready.
serviceReady boolean true when the Service resource is created.
internalEndpoint string Internal cluster endpoint URL.
endpointReady boolean true when the Ingress is ready (if enabled).
externalEndpoint string External URL (populated if Ingress is enabled).
resolvedModel.name string Name of the resolved Model CR.
resolvedModel.variant string Selected variant ID.
resolvedModel.image string Container image used for this deployment.
authentication.keysSecretName string Name of the secret containing API keys.
conditions array Detailed status conditions.
lastUpdated datetime Timestamp of last status update.

Model CRD spec fields

The Model CRD is for BYO (custom) models only. Catalog models are resolved from the catalog ConfigMap and do not use this CRD. To deploy a catalog model, use model.catalog in the ModelDeployment spec instead.

Field Type Required Description
displayName string No Human-readable model name.
description string No Model description.
publisher string No Model publisher.
license string No License identifier.
licenseUrl string No URL to the full license text.
source object Yes Model source configuration.
source.type string Yes catalog or custom.
source.catalog.alias string Conditional Catalog model alias. Required when source.type: catalog.
source.catalog.modelId string Conditional Full catalog model ID.
source.custom.registry string Conditional OCI registry URL. Required when source.type: custom.
source.custom.repository string Conditional Repository path in the registry.
source.custom.tag string Conditional Image tag.
source.custom.credentials.secretRef.name string Conditional Name of the Kubernetes secret with registry credentials.
variants array No Hardware-specific variant overrides.
requirements object No Resource requirements.
capabilities object No Model capabilities.

Model CRD status fields

Field Type Description
phase string Model phase: Pending, Available, or Error.
message string Human-readable status message.
catalogSync.lastSynced datetime Timestamp of the last catalog sync.
catalogSync.syncStatus string Syncing, Synced, or Error.
conditions array Detailed status conditions.
lastUpdated datetime Timestamp of last status update.

Inference operator configuration

The inference operator reads its configuration from a ConfigMap mounted at /etc/inference-operator/config.yaml.

Configuration file example

# Container registry for inference images
registry: "myregistry.azurecr.io"

# Container images for different workload types
images:
  generative_cpu:
    repository: generative-cpu
    tag: "latest"
  generative_gpu:
    repository: generative-gpu
    tag: "latest"
  predictive_cpu_oras:
    repository: predictive-cpu-byo
    tag: "latest"
  predictive_gpu_oras:
    repository: predictive-gpu-byo
    tag: "latest"
  Vllm_gpu:
    repository: vllm-server
    tag: "latest"


# Ingress defaults
ingress:
  pathTemplate: "/{name}(/|$)(.*)"
  rewritePathTemplate: "/$2"
  pathType: "ImplementationSpecific"
  ingressClassName: "nginx"

# Catalog settings
catalog:
  configmapName: "foundry-local-catalog"
  configmapNamespace: "foundry-local-operator"

Configuration fields

Field Type Default Description
registry string "" Container registry prefix for inference images.
images.<type>.repository string Varies by workload type Image repository path. Types: generative_cpu, generative_gpu, generative_cpu_oras, generative_gpu_oras, predictive_cpu_oras, predictive_gpu_oras, vllm_gpu.
images.<type>.tag string latest Image tag.
ingress.pathTemplate string /{name}(/\|$)(.*) Ingress path template. {name} is replaced with the deployment name.
ingress.rewritePathTemplate string /$2 Rewrite target path for NGINX.
ingress.pathType string ImplementationSpecific Ingress path matching type.
ingress.ingressClassName string nginx Default IngressClass name.
catalog.configmapName string foundry-local-catalog Name of the catalog ConfigMap.
catalog.configmapNamespace string foundry-local-operator Namespace of the catalog ConfigMap.
catalog.lazyRegistrationEnabled boolean true Automatically create Model CRs from catalog on first deployment.