Open Service Mesh (OSM) AKS add-on Troubleshooting Guides
When you deploy the OSM AKS add-on, you could possibly experience problems associated with configuration of the service mesh. The following guide will assist you on how to troubleshoot errors and resolve common problems.
Verifying and Troubleshooting OSM components
Check OSM Controller Deployment, Pod, and Service
kubectl get deployment,pod,service -n kube-system --selector app=osm-controller
A healthy OSM Controller would look like this:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/osm-controller 2/2 2 2 3m4s
NAME READY STATUS RESTARTS AGE
pod/osm-controller-65bd8c445c-zszp4 1/1 Running 0 2m
pod/osm-controller-65bd8c445c-xqhmk 1/1 Running 0 16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/osm-controller ClusterIP 10.96.185.178 <none> 15128/TCP,9092/TCP,9091/TCP 3m4s
service/osm-validator ClusterIP 10.96.11.78 <none> 9093/TCP 3m4s
Note
For the osm-controller services the CLUSTER-IP would be different. The service NAME and PORT(S) must be the same as the example above.
Check OSM Injector Deployment, Pod, and Service
kubectl get deployment,pod,service -n kube-system --selector app=osm-injector
A healthy OSM Injector would look like this:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/osm-injector 2/2 2 2 4m37s
NAME READY STATUS RESTARTS AGE
pod/osm-injector-5c49bd8d7c-b6cx6 1/1 Running 0 4m21s
pod/osm-injector-5c49bd8d7c-dx587 1/1 Running 0 4m37s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/osm-injector ClusterIP 10.96.236.108 <none> 9090/TCP 4m37s
Check OSM Bootstrap Deployment, Pod, and Service
kubectl get deployment,pod,service -n kube-system --selector app=osm-bootstrap
A healthy OSM Bootstrap would look like this:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/osm-bootstrap 1/1 1 1 5m25s
NAME READY STATUS RESTARTS AGE
pod/osm-bootstrap-594ffc6cb7-jc7bs 1/1 Running 0 5m25s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/osm-bootstrap ClusterIP 10.96.250.208 <none> 9443/TCP,9095/TCP 5m25s
Check Validating and Mutating webhooks
kubectl get ValidatingWebhookConfiguration --selector app=osm-controller
A healthy OSM Validating Webhook would look like this:
NAME WEBHOOKS AGE
aks-osm-validator-mesh-osm 1 81m
kubectl get MutatingWebhookConfiguration --selector app=osm-injector
A healthy OSM Mutating Webhook would look like this:
NAME WEBHOOKS AGE
aks-osm-webhook-osm 1 102m
Check for the service and the CA bundle of the Validating webhook
kubectl get ValidatingWebhookConfiguration aks-osm-validator-mesh-osm -o json | jq '.webhooks[0].clientConfig.service'
A well configured Validating Webhook Configuration would look exactly like this:
{
"name": "osm-config-validator",
"namespace": "kube-system",
"path": "/validate-webhook",
"port": 9093
}
Check for the service and the CA bundle of the Mutating webhook
kubectl get MutatingWebhookConfiguration aks-osm-webhook-osm -o json | jq '.webhooks[0].clientConfig.service'
A well configured Mutating Webhook Configuration would look exactly like this:
{
"name": "osm-injector",
"namespace": "kube-system",
"path": "/mutate-pod-creation",
"port": 9090
}
Check the osm-mesh-config
resource
Check for the existence:
kubectl get meshconfig osm-mesh-config -n kube-system
Check the content of the OSM MeshConfig
kubectl get meshconfig osm-mesh-config -n kube-system -o yaml
apiVersion: config.openservicemesh.io/v1alpha1
kind: MeshConfig
metadata:
creationTimestamp: "0000-00-00A00:00:00A"
generation: 1
name: osm-mesh-config
namespace: kube-system
resourceVersion: "2494"
uid: 6c4d67f3-c241-4aeb-bf4f-b029b08faa31
spec:
certificate:
serviceCertValidityDuration: 24h
featureFlags:
enableEgressPolicy: true
enableMulticlusterMode: false
enableWASMStats: true
observability:
enableDebugServer: true
osmLogLevel: info
tracing:
address: jaeger.kube-system.svc.cluster.local
enable: false
endpoint: /api/v2/spans
port: 9411
sidecar:
configResyncInterval: 0s
enablePrivilegedInitContainer: false
envoyImage: mcr.microsoft.com/oss/envoyproxy/envoy:v1.18.3
initContainerImage: mcr.microsoft.com/oss/openservicemesh/init:v0.9.1
logLevel: error
maxDataPlaneConnections: 0
resources: {}
traffic:
enableEgress: true
enablePermissiveTrafficPolicyMode: true
inboundExternalAuthorization:
enable: false
failureModeAllow: false
statPrefix: inboundExtAuthz
timeout: 1s
useHTTPSIngress: false
osm-mesh-config
resource values:
Key | Type | Default Value | Kubectl Patch Command Examples |
---|---|---|---|
spec.traffic.enableEgress | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enableEgress":true}}}' --type=merge |
spec.traffic.enablePermissiveTrafficPolicyMode | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge |
spec.traffic.useHTTPSIngress | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"useHTTPSIngress":true}}}' --type=merge |
spec.traffic.outboundPortExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundPortExclusionList":[6379,8080]}}}' --type=merge |
spec.traffic.outboundIPRangeExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"outboundIPRangeExclusionList":["10.0.0.0/32","1.1.1.1/24"]}}}' --type=merge |
spec.traffic.inboundPortExclusionList | array | [] |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"traffic":{"inboundPortExclusionList":[6379,8080]}}}' --type=merge |
spec.certificate.serviceCertValidityDuration | string | "24h" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"certificate":{"serviceCertValidityDuration":"24h"}}}' --type=merge |
spec.observability.enableDebugServer | bool | true |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"enableDebugServer":true}}}' --type=merge |
spec.observability.tracing.enable | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"enable":true}}}}' --type=merge |
spec.observability.tracing.address | string | "jaeger.kube-system.svc.cluster.local" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"address": "jaeger.kube-system.svc.cluster.local"}}}}' --type=merge |
spec.observability.tracing.endpoint | string | "/api/v2/spans" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"endpoint":"/api/v2/spans"}}}}' --type=merge' --type=merge |
spec.observability.tracing.port | int | 9411 |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"port":9411}}}}' --type=merge |
spec.observability.tracing.osmLogLevel | string | "info" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"observability":{"tracing":{"osmLogLevel": "info"}}}}' --type=merge |
spec.sidecar.enablePrivilegedInitContainer | bool | false |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"enablePrivilegedInitContainer":true}}}' --type=merge |
spec.sidecar.logLevel | string | "error" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"logLevel":"error"}}}' --type=merge |
spec.sidecar.maxDataPlaneConnections | int | 0 |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"maxDataPlaneConnections":"error"}}}' --type=merge |
spec.sidecar.envoyImage | string | "mcr.microsoft.com/oss/envoyproxy/envoy:v1.19.1" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"envoyImage":"mcr.microsoft.com/oss/envoyproxy/envoy:v1.19.1"}}}' --type=merge |
spec.sidecar.initContainerImage | string | "mcr.microsoft.com/oss/openservicemesh/init:v0.11.1" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"initContainerImage":"mcr.microsoft.com/oss/openservicemesh/init:v0.11.1"}}}' --type=merge |
spec.sidecar.configResyncInterval | string | "0s" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"sidecar":{"configResyncInterval":"30s"}}}' --type=merge |
spec.featureFlags.enableWASMStats | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableWASMStats":"true"}}}' --type=merge |
spec.featureFlags.enableEgressPolicy | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEgressPolicy":"true"}}}' --type=merge |
spec.featureFlags.enableMulticlusterMode | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableMulticlusterMode":"false"}}}' --type=merge |
spec.featureFlags.enableSnapshotCacheMode | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableSnapshotCacheMode":"false"}}}' --type=merge |
spec.featureFlags.enableAsyncProxyServiceMapping | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableAsyncProxyServiceMapping":"false"}}}' --type=merge |
spec.featureFlags.enableIngressBackendPolicy | bool | "true" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableIngressBackendPolicy":"true"}}}' --type=merge |
spec.featureFlags.enableEnvoyActiveHealthChecks | bool | "false" |
kubectl patch meshconfig osm-mesh-config -n kube-system -p '{"spec":{"featureFlags":{"enableEnvoyActiveHealthChecks":"false"}}}' --type=merge |
Check Namespaces
Note
The kube-system namespace will never participate in a service mesh and will never be labeled and/or annotated with the key/values below.
We use the osm namespace add
command to join namespaces to a given service mesh.
When a k8s namespace is part of the mesh (or for it to be part of the mesh) the following must be true:
View the annotations with
kubectl get namespace bookbuyer -o json | jq '.metadata.annotations'
The following annotation must be present:
{
"openservicemesh.io/sidecar-injection": "enabled"
}
View the labels with
kubectl get namespace bookbuyer -o json | jq '.metadata.labels'
The following label must be present:
{
"openservicemesh.io/monitored-by": "osm"
}
If a namespace is not annotated with "openservicemesh.io/sidecar-injection": "enabled"
or not labeled with "openservicemesh.io/monitored-by": "osm"
the OSM Injector will not add Envoy sidecars.
Note
After osm namespace add
is called only new pods will be injected with an Envoy sidecar. Existing pods must be restarted with kubectl rollout restart deployment ...
Verify OSM CRDs:
Check whether the cluster has the required CRDs:
kubectl get crds
We must have the following installed on the cluster:
- egresses.policy.openservicemesh.io
- httproutegroups.specs.smi-spec.io
- ingressbackends.policy.openservicemesh.io
- meshconfigs.config.openservicemesh.io
- multiclusterservices.config.openservicemesh.io
- tcproutes.specs.smi-spec.io
- trafficsplits.split.smi-spec.io
- traffictargets.access.smi-spec.io
Get the versions of the SMI CRDs installed with this command:
osm mesh list
Expected output:
MESH NAME MESH NAMESPACE VERSION ADDED NAMESPACES
osm kube-system v0.11.1
MESH NAME MESH NAMESPACE SMI SUPPORTED
osm kube-system HTTPRouteGroup:v1alpha4,TCPRoute:v1alpha4,TrafficSplit:v1alpha2,TrafficTarget:v1alpha3
To list the OSM controller pods for a mesh, please run the following command passing in the mesh's namespace
kubectl get pods -n <osm-mesh-namespace> -l app=osm-controller
OSM Controller v0.11.1 requires the following versions:
- traffictargets.access.smi-spec.io - v1alpha3
- httproutegroups.specs.smi-spec.io - v1alpha4
- tcproutes.specs.smi-spec.io - v1alpha4
- udproutes.specs.smi-spec.io - Not supported
- trafficsplits.split.smi-spec.io - v1alpha2
- *.metrics.smi-spec.io - v1alpha1
Certificate management
Information on how OSM issues and manages certificates to Envoy proxies running on application pods can be found on the OpenServiceMesh docs site.
Upgrading Envoy
When a new pod is created in a namespace monitored by the add-on, OSM will inject an envoy proxy sidecar in that pod. Information regarding how to update the envoy version can be found in the Upgrade Guide on the OpenServiceMesh docs site.
Feedback
Submit and view feedback for