Open Service Mesh (OSM) AKS add-on Troubleshooting Guides

When you deploy the OSM AKS add-on, you could possibly experience problems associated with configuration of the service mesh. The following guide will assist you on how to troubleshoot errors and resolve common problems.

Verifying and Troubleshooting OSM components

Check OSM Controller Deployment, Pod, and Service

kubectl get deployment,pod,service -n kube-system --selector app=osm-controller

A healthy OSM Controller would look like this:

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/osm-controller   2/2     2            2           3m4s

NAME                                  READY   STATUS    RESTARTS   AGE
pod/osm-controller-65bd8c445c-zszp4   1/1     Running   0          2m
pod/osm-controller-65bd8c445c-xqhmk   1/1     Running   0          16s

NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
service/osm-controller   ClusterIP   10.96.185.178   <none>        15128/TCP,9092/TCP,9091/TCP   3m4s
service/osm-validator    ClusterIP   10.96.11.78     <none>        9093/TCP                      3m4s

Note

For the osm-controller services the CLUSTER-IP would be different. The service NAME and PORT(S) must be the same as the example above.

Check OSM Injector Deployment, Pod, and Service

kubectl get deployment,pod,service -n kube-system --selector app=osm-injector

A healthy OSM Injector would look like this:

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/osm-injector   2/2     2            2           4m37s

NAME                                READY   STATUS    RESTARTS   AGE
pod/osm-injector-5c49bd8d7c-b6cx6   1/1     Running   0          4m21s
pod/osm-injector-5c49bd8d7c-dx587   1/1     Running   0          4m37s

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/osm-injector   ClusterIP   10.96.236.108   <none>        9090/TCP   4m37s

Check OSM Bootstrap Deployment, Pod, and Service

kubectl get deployment,pod,service -n kube-system --selector app=osm-bootstrap

A healthy OSM Bootstrap would look like this:

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/osm-bootstrap   1/1     1            1           5m25s

NAME                                 READY   STATUS    RESTARTS   AGE
pod/osm-bootstrap-594ffc6cb7-jc7bs   1/1     Running   0          5m25s

NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/osm-bootstrap   ClusterIP   10.96.250.208   <none>        9443/TCP,9095/TCP   5m25s

Check Validating and Mutating webhooks

kubectl get ValidatingWebhookConfiguration --selector app=osm-controller

A healthy OSM Validating Webhook would look like this:

NAME              WEBHOOKS   AGE
aks-osm-validator-mesh-osm   1      81m
kubectl get MutatingWebhookConfiguration --selector app=osm-injector

A healthy OSM Mutating Webhook would look like this:

NAME              WEBHOOKS   AGE
aks-osm-webhook-osm   1      102m

Check for the service and the CA bundle of the Validating webhook

kubectl get ValidatingWebhookConfiguration aks-osm-validator-mesh-osm -o json | jq '.webhooks[0].clientConfig.service'

A well configured Validating Webhook Configuration would look exactly like this:

{
  "name": "osm-config-validator",
  "namespace": "kube-system",
  "path": "/validate-webhook",
  "port": 9093
}

Check for the service and the CA bundle of the Mutating webhook

kubectl get MutatingWebhookConfiguration aks-osm-webhook-osm -o json | jq '.webhooks[0].clientConfig.service'

A well configured Mutating Webhook Configuration would look exactly like this:

{
  "name": "osm-injector",
  "namespace": "kube-system",
  "path": "/mutate-pod-creation",
  "port": 9090
}

Check the osm-mesh-config resource

Check for the existence:

kubectl get meshconfig osm-mesh-config -n kube-system

Check the content of the OSM MeshConfig

kubectl get meshconfig osm-mesh-config -n kube-system -o yaml
apiVersion: config.openservicemesh.io/v1alpha1
kind: MeshConfig
metadata:
  creationTimestamp: "0000-00-00A00:00:00A"
  generation: 1
  name: osm-mesh-config
  namespace: kube-system
  resourceVersion: "2494"
  uid: 6c4d67f3-c241-4aeb-bf4f-b029b08faa31
spec:
  certificate:
    serviceCertValidityDuration: 24h
  featureFlags:
    enableEgressPolicy: true
    enableMulticlusterMode: false
    enableWASMStats: true
  observability:
    enableDebugServer: true
    osmLogLevel: info
    tracing:
      address: jaeger.kube-system.svc.cluster.local
      enable: false
      endpoint: /api/v2/spans
      port: 9411
  sidecar:
    configResyncInterval: 0s
    enablePrivilegedInitContainer: false
    envoyImage: mcr.microsoft.com/oss/envoyproxy/envoy:v1.18.3
    initContainerImage: mcr.microsoft.com/oss/openservicemesh/init:v0.9.1
    logLevel: error
    maxDataPlaneConnections: 0
    resources: {}
  traffic:
    enableEgress: true
    enablePermissiveTrafficPolicyMode: true
    inboundExternalAuthorization:
      enable: false
      failureModeAllow: false
      statPrefix: inboundExtAuthz
      timeout: 1s
    useHTTPSIngress: false

osm-mesh-config resource values:

Key Type Default Value Kubectl Patch Command Examples
spec.traffic.enableEgress bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enableEgress":true}}}' --type=merge
spec.traffic.enablePermissiveTrafficPolicyMode bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge
spec.traffic.useHTTPSIngress bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"useHTTPSIngress":true}}}' --type=merge
spec.traffic.outboundPortExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundPortExclusionList":[6379,8080]}}}' --type=merge
spec.traffic.outboundIPRangeExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundIPRangeExclusionList":["10.0.0.0/32","1.1.1.1/24"]}}}' --type=merge
spec.traffic.inboundPortExclusionList array [] kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"inboundPortExclusionList":[6379,8080]}}}' --type=merge
spec.certificate.serviceCertValidityDuration string "24h" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"certificate":{"serviceCertValidityDuration":"24h"}}}' --type=merge
spec.observability.enableDebugServer bool true kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"enableDebugServer":true}}}' --type=merge
spec.observability.tracing.enable bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"enable":true}}}}' --type=merge
spec.observability.tracing.address string "jaeger.kube-system.svc.cluster.local" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"address": "jaeger.kube-system.svc.cluster.local"}}}}' --type=merge
spec.observability.tracing.endpoint string "/api/v2/spans" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"endpoint":"/api/v2/spans"}}}}' --type=merge' --type=merge
spec.observability.tracing.port int 9411 kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"port":9411}}}}' --type=merge
spec.observability.tracing.osmLogLevel string "info" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"osmLogLevel": "info"}}}}' --type=merge
spec.sidecar.enablePrivilegedInitContainer bool false kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"enablePrivilegedInitContainer":true}}}' --type=merge
spec.sidecar.logLevel string "error" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"logLevel":"error"}}}' --type=merge
spec.sidecar.maxDataPlaneConnections int 0 kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"maxDataPlaneConnections":"error"}}}' --type=merge
spec.sidecar.envoyImage string "mcr.microsoft.com/oss/envoyproxy/envoy:v1.19.1" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"envoyImage":"mcr.microsoft.com/oss/envoyproxy/envoy:v1.19.1"}}}' --type=merge
spec.sidecar.initContainerImage string "mcr.microsoft.com/oss/openservicemesh/init:v0.11.1" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"initContainerImage":"mcr.microsoft.com/oss/openservicemesh/init:v0.11.1"}}}' --type=merge
spec.sidecar.configResyncInterval string "0s" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"configResyncInterval":"30s"}}}' --type=merge
spec.featureFlags.enableWASMStats bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableWASMStats":"true"}}}' --type=merge
spec.featureFlags.enableEgressPolicy bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEgressPolicy":"true"}}}' --type=merge
spec.featureFlags.enableMulticlusterMode bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableMulticlusterMode":"false"}}}' --type=merge
spec.featureFlags.enableSnapshotCacheMode bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableSnapshotCacheMode":"false"}}}' --type=merge
spec.featureFlags.enableAsyncProxyServiceMapping bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableAsyncProxyServiceMapping":"false"}}}' --type=merge
spec.featureFlags.enableIngressBackendPolicy bool "true" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableIngressBackendPolicy":"true"}}}' --type=merge
spec.featureFlags.enableEnvoyActiveHealthChecks bool "false" kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEnvoyActiveHealthChecks":"false"}}}' --type=merge

Check Namespaces

Note

The kube-system namespace will never participate in a service mesh and will never be labeled and/or annotated with the key/values below.

We use the osm namespace add command to join namespaces to a given service mesh. When a k8s namespace is part of the mesh (or for it to be part of the mesh) the following must be true:

View the annotations with

kubectl get namespace bookbuyer -o json | jq '.metadata.annotations'

The following annotation must be present:

{
  "openservicemesh.io/sidecar-injection": "enabled"
}

View the labels with

kubectl get namespace bookbuyer -o json | jq '.metadata.labels'

The following label must be present:

{
  "openservicemesh.io/monitored-by": "osm"
}

If a namespace is not annotated with "openservicemesh.io/sidecar-injection": "enabled" or not labeled with "openservicemesh.io/monitored-by": "osm" the OSM Injector will not add Envoy sidecars.

Note

After osm namespace add is called only new pods will be injected with an Envoy sidecar. Existing pods must be restarted with kubectl rollout restart deployment ...

Verify OSM CRDs:

Check whether the cluster has the required CRDs:

kubectl get crds

We must have the following installed on the cluster:

  • egresses.policy.openservicemesh.io
  • httproutegroups.specs.smi-spec.io
  • ingressbackends.policy.openservicemesh.io
  • meshconfigs.config.openservicemesh.io
  • multiclusterservices.config.openservicemesh.io
  • tcproutes.specs.smi-spec.io
  • trafficsplits.split.smi-spec.io
  • traffictargets.access.smi-spec.io

Get the versions of the SMI CRDs installed with this command:

osm mesh list

Expected output:

MESH NAME   MESH NAMESPACE   VERSION   ADDED NAMESPACES
osm         kube-system      v0.11.1

MESH NAME   MESH NAMESPACE   SMI SUPPORTED
osm         kube-system      HTTPRouteGroup:v1alpha4,TCPRoute:v1alpha4,TrafficSplit:v1alpha2,TrafficTarget:v1alpha3

To list the OSM controller pods for a mesh, please run the following command passing in the mesh's namespace
        kubectl get pods -n <osm-mesh-namespace> -l app=osm-controller

OSM Controller v0.11.1 requires the following versions:

  • traffictargets.access.smi-spec.io - v1alpha3
  • httproutegroups.specs.smi-spec.io - v1alpha4
  • tcproutes.specs.smi-spec.io - v1alpha4
  • udproutes.specs.smi-spec.io - Not supported
  • trafficsplits.split.smi-spec.io - v1alpha2
  • *.metrics.smi-spec.io - v1alpha1

Certificate management

Information on how OSM issues and manages certificates to Envoy proxies running on application pods can be found on the OpenServiceMesh docs site.

Upgrading Envoy

When a new pod is created in a namespace monitored by the add-on, OSM will inject an envoy proxy sidecar in that pod. Information regarding how to update the envoy version can be found in the Upgrade Guide on the OpenServiceMesh docs site.