General troubleshooting of the Istio service mesh add-on
This article discusses general strategies (that use kubectl
, istioctl
, and other tools) to troubleshoot issues that are related to the Istio service mesh add-on for Microsoft Azure Kubernetes Service (AKS). This article also provides a list of possible error messages, reasons for error occurrences, and recommendations to resolve these errors.
Prerequisites
The Kubernetes kubectl tool, or a similar tool to connect to the cluster
Note: To install kubectl by using Azure CLI, run the az aks install-cli command.
The Istio istioctl command-line tool
The Client URL (cURL) tool
Troubleshooting checklist: Using kubectl
The following troubleshooting steps use various kubectl
commands to help you debug stuck pods or failures in the Istio daemon (Istiod).
Step 1: Get Istiod pod logs
Get the Istiod pod logs by running the following kubectl logs command:
kubectl logs --selector app=istiod --namespace aks-istio-system
Step 2: Bounce (delete) a pod
You might have a good reason to restart a pod. Because Istiod is a deployment, it's safe to just delete the pod by running the kubectl delete command:
kubectl delete pods <istio-pod> --namespace aks-istio-system
The Istio pod is managed by a deployment, so the pod is automatically re-created and redeployed after you delete the Istio pod directly. Therefore, deleting the pod is an alternative method for restarting the pod.
Note
Alternatively, you can restart the deployment directly by running the following kubectl rollout restart command:
kubectl rollout restart deployment <istiod-asm-revision> --namespace aks-istio-system
Step 3: Check the status of resources
If Istiod isn't scheduled, or if the pod isn't responding, you might want to check the status of the deployment and the replica sets. To do this, run the kubectl get command:
kubectl get <resource-type> [[--selector app=istiod] | [<resource-name>]]
The current resource status appears near the end of the output. The output might also display events that are associated with its controller loop.
Step 4: Get custom resource definition types
To view the types of custom resource definitions (CRDs) that Istio uses, run the kubectl get
command:
kubectl get crd | grep istio
Next, run the following kubectl get
command to list all the resource names that are based on a particular CRD:
kubectl get <crd-type> --all-namespaces
Step 5: View the list of Istiod pods
View the list of Istiod pods by running the following kubectl get
command:
kubectl get pod --namespace aks-istio-system --output yaml
Step 6: Get more information about the Envoy configuration
If you have connectivity issues between pods, get more information about the Envoy configuration by running the following kubectl exec command against Envoy's admin port:
kubectl exec --namespace <pod-namespace> \
"$(kubectl get pods \
--namespace <pod-namespace> \
--selector app=sleep \
--output jsonpath='{.items[0].metadata.name}')" \
--container sleep \
-- curl -s localhost:15000/clusters
Step 7: Get the sidecar logs for the source and destination sidecars
Retrieve the sidecar logs for the source and destination sidecars by running the following kubectl logs
command two times (the first time for the source pod, and the second time for the destination pod):
kubectl logs <pod-name> --namespace <pod-namespace> --container istio-proxy
Troubleshooting checklist: Using istioctl
The following troubleshooting steps describe how to collect information and debug your mesh environment by running various istioctl
commands.
Warning
Some istioctl
commands send requests to all sidecars.
Note
Before you begin, notice that most istioctl
commands require you to know the control plane revision. You can get this information from the suffix of either the Istiod deployments or the pods, or you can run the following istioctl tag list command:
istioctl tag list
Step 1: Make sure that Istio is installed correctly
To verify that you have a correct Istio add-on installation, run the following istioctl verify-install command:
istioctl verify-install --istioNamespace aks-istio-system --revision <tag>
Step 2: Analyze namespaces
To analyze all namespaces, or to analyze a specific namespace, run the following istioctl analyze command:
istioctl analyze --istioNamespace aks-istio-system \
--revision <tag> \
[--all-namespaces | --namespace <namespace-name>] \
[--failure-threshold {Info | Warning | Error}]
Step 3: Get the proxy status
To retrieve the proxy status, run the following istioctl proxy-status command:
istioctl proxy-status pod/<pod-name> \
--istioNamespace aks-istio-system \
--revision <tag> \
--namespace <pod-namespace>
Step 4: Download the proxy configuration
To download the proxy configuration, run the following istioctl proxy-config all command:
istioctl proxy-config all <pod-name> \
--istioNamespace aks-istio-system \
--namespace <pod-namespace> \
--output json
Note
Instead of using the all
variant of the istioctl proxy-config
command, you can use one of the following variants:
Step 5: Check the injection status
To check the injection status of the resource, run the following istioctl experimental check-inject command:
istioctl experimental check-inject --istioNamespace aks-istio-system \
--namespace <pod-namespace> \
--labels <label-selector> | <pod-name> | deployment/<deployment-name>
Step 6: Get a full bug report
A full bug report contains the most detailed information. However, it can also be time-consuming on a large cluster because it includes all pods. You can limit the bug report to certain namespaces. You can also limit the report to certain deployments, pods, or label selectors.
To retrieve a bug report, run the following istioctl bug-report command:
istioctl bug-report --istioNamespace aks-istio-system \
[--include <namespace-1>[, <namespace-2>[, ...]]]
Troubleshooting checklist: Miscellaneous issues
Step 1: Fix resource usage issues
If you encounter high memory consumption in Envoy, double-check your Envoy settings for statistics data collection. If you're customizing Istio metrics through MeshConfig, remember that certain metrics can have high cardinality and, therefore, create a higher memory footprint. Other fields in MeshConfig, such as concurrency, affect CPU usage and should be configured carefully.
By default, Istio adds information about all services that are in the cluster to every Envoy configuration. The Sidecar can limit the scope of this addition to workloads within specific namespaces only. For more information, see Watch out for this Istio proxy sidecar memory pitfall.
For example, the following Sidecar
definition in the aks-istio-system
namespace restricts the Envoy configuration for all proxies across the mesh to aks-istio-system
and other workloads within the same namespace as that specific application.
apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
name: sidecar-restrict-egress
namespace: aks-istio-system # Needs to be deployed in the root namespace.
spec:
egress:
- hosts:
- "./*"
- "aks-istio-system/*"
You can also try to use the Istio discoverySelectors option in MeshConfig. The discoverySelectors
option contains an array of Kubernetes selectors, and it can restrict Istiod's awareness to specific namespaces (as opposed to all namespaces in the cluster). For more information, see Use discovery selectors to configure namespaces for your Istio service mesh.
Step 2: Fix traffic and security misconfiguration issues
To address common traffic management and security misconfiguration issues that Istio users frequently encounter, see Traffic management problems and Security problems on the Istio website.
For links to discussion about other issues, such as sidecar injection, observability, and upgrades, see Common problems on the Istio documentation site.
Step 3: Avoid CoreDNS overload
Issues that relate to CoreDNS overload might require you to change certain Istio DNS settings, such as the dnsRefreshRate
field in the Istio MeshConfig definition.
Step 4: Fix pod and sidecar race conditions
If your application pod starts before the Envoy sidecar starts, the application might become unresponsive, or it might restart. For instructions about how to avoid this problem, see Pod or containers start with network issues if istio-proxy is not ready. Specifically, setting the holdApplicationUntilProxyStarts
MeshConfig field under defaultConfig
to true
can help prevent these race conditions.
Step 5: Configure a Service Entry when using an HTTP proxy for outbound traffic
If your cluster uses an HTTP proxy for outbound internet access, you'll need to configure a Service Entry. For more information, see HTTP proxy support in Azure Kubernetes Service.
Error messages
The following table contains a list of possible error messages (for deploying the add-on, enabling ingress gateways, and performing upgrades), the reason why an error occurred, and recommendations for resolving the error.
Error | Reason | Recommendations |
---|---|---|
Azure service mesh is not supported in this region |
The feature isn't available in the region during preview (it's available in the public cloud but not the sovereign cloud). | Refer to public documentation about the feature on supported regions. |
Missing service mesh mode: {} |
You didn't set the mode property in the service mesh profile of the managed cluster request. | In the ServiceMeshProfile field of the managedCluster API request, set the mode property to Istio . |
Invalid istio ingress mode: {} |
You set an invalid value for the ingress mode when adding ingress within the service mesh profile. | Set the ingress mode in the API request to either External or Internal . |
Too many ingresses for type: {}. Only {} ingress gateway are allowed |
You tried to create too many ingresses on the cluster. | Create, at most, one external ingress and one internal ingress on the cluster. |
Istio profile is missing even though Service Mesh mode is Istio |
You enabled the Istio add-on without providing the Istio profile. | When you enable the Istio add-on, specify component-specific (ingress gateway, plug-in CA) information for the Istio profile and the particular revision. |
Istio based Azure service mesh is incompatible with feature %s |
You tried to use another extension, add-on, or feature that's currently incompatible with the Istio add-on (for example, Open Service Mesh). | Before you enable the Istio add-on, disable the other feature first and clean up all corresponding resources. |
ServiceMeshProfile is missing required parameters: %s for plugin certificate authority |
You didn't provide all the required parameters for plug-in CA. | Provide all required parameters for the plug-in certificate authority (CA) feature (for more information, see Set up Istio-based service mesh add-on with plug-in CA certificates). |
AzureKeyvaultSecretsProvider addon is required for Azure Service Mesh plugin certificate authority feature |
You didn't enable the AKS Secrets-Store CSI Driver add-on before you used the plug-in CA. | Set up Azure Key Vault before you use the plug-in CA feature. |
'KeyVaultId': '%s' is not a valid Azure keyvault resource identifier. Please make sure that the format matches '/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/' |
You used an invalid AKS resource ID. | See the format that's mentioned in the error message to set a valid Azure Key Vault ID for the plug-in CA feature. |
Kubernetes version is missing in orchestrator profile |
Your request is missing the Kubernetes version. Therefore, it can't do a version compatibility check. | Make sure that you provide the Kubernetes version in Istio add-on upgrade operations. |
Service mesh revision %s is not compatible with cluster version %s. To find information about mesh-cluster compatibility, use 'az aks mesh get-upgrades' |
You tried to enable an Istio add-on revision that's incompatible with the current Kubernetes cluster version. | Use the az aks mesh get-upgrades Azure CLI command to learn which Istio add-on revisions are available for the current cluster. |
Kubernetes version %s not supported. Please upgrade to a supported cluster version first. To find compatibility information, use 'az aks mesh get-upgrades' |
You're using an unsupported Kubernetes version. | Upgrade to a supported Kubernetes version. |
ServiceMeshProfile revision field must not be empty |
You tried to upgrade the Istio add-on without specifying a revision. | Specify the revision and all other parameters (for more information, see Minor revision upgrade). |
Request exceeds maximum allowed number of revisions (%d) |
You tried to do an upgrade operation even though there are already (%d) revisions installed. |
Complete or roll back the upgrade operation before you upgrade to another revision. |
Mesh upgrade is in progress. Please complete or roll back the current upgrade before attempting to retrieve versioning and compatibility information |
You tried to access revisioning and compatibility information before completing or rolling back the current upgrade operation. | Complete or roll back the current upgrade operation before you retrieve revisioning and compatibility information. |
References
For general tips about Istio debugging, see Istio diagnostic tools
Istio service mesh add-on minor revision upgrade troubleshooting
Istio service mesh add-on plug-in CA certificate troubleshooting
Third-party information disclaimer
The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.
Contact us for help
If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.