Use this topic to help you troubleshoot and resolve security and identity-related issues in AKS Arc.
Get-AksHciCredential fails with "cannot find the path specified" error
The Get-AksHciCredential
PowerShell cmdlet fails when executed by a different admin user than the one used for installing AksHci. The command creates a .kube directory and places the config file in it. However, the command fails with the following error:
Error: open C:\Users\<user>\.kube\config: The system cannot find the path specified.
To reproduce
- Install AksHci.
- Create a target cluster.
- Sign in to the machine as a different admin user (multi-admin feature).
- Run
Get-AksHciCredential -Name $clusterName
.
Expected behavior
Get-AksHciCredential
should be able to create a .kube directory in the user's home directory and place the config file in that directory.
To work around the issue, create a .kube directory in the user's home directory. Use the following command to create the directory:
mkdir "$HOME/.kube"
After this step, Get-AksHciCredential
should not fail.
Error "Certificate expired - Unable to connect to the server: x509"
The target cluster is not accessible when control plane certificates fail to renew. When trying to reach the cluster, the kubectl
command displays the following error:
certificate expired - Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-07-26T12:24:15-04:00 is after 2022-07-15T15:01:07Z
Note
This issue is fixed in the September 2022 release and later.
To reproduce
- Install AksHci.
- Install target cluster. 3. Turn off the cluster (VMs) for more than 4 days.
- Turn the cluster on again.
Symptoms and mitigation
The target cluster is unreachable. Any kubectl
command run against the target cluster returns an error message similar to the following one:
certificate expired - Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-07-26T12:24:15-04:00 is after 2022-07-15T15:01:07Z
To fix the issue:
Run the following command to renew the generated certificate manually:
Update-AksHciClusterCertificates -Name my-workload -cluster -fixKubeletCredentials
Generate new credentials:
Get-AksHciCredential -name <clustername>
After a few minutes, try the kubectl
command again to see if the cluster is now available.
Note
There is a known bug in AksHci version 1.0.14.x and earlier. If the control plane VM has a name pattern other than -control-plane-
, the Update-AksHciClusterCertificates
command may not work. You must update the certificate by logging into the control plane VM:
- Find the IP address of the target cluster control plane VM.
- Run the following command:
ssh -i (get-mocconfig).sshPrivateKey clouduser@<ip>
- List the cert tattoo files:
sudo ls /etc/kubernetes/pki/cert-tattoo-*
- Generate certificates using each file listed by the previous command:
sudo /usr/bin/cert-tattoo-provision CreateCertsWithAltNames <absolute-path-of-cert-tattoo-file>
Authentication handshake failed: x509: certificate signed by unknown authority
You may see this error when deploying a new AKS cluster or adding a node pool to an existing cluster.
- Check that the user who has run the command is the same user that installed AKS on Azure Stack or Windows Server. For more information on granting access to multiple users, see Set up multiple administrators.
- If the user is the same and the error persists, follow the below steps to resolve the issue:
- Delete old management appliance certificate by removing
$env:UserProfile.wssd\kvactl\cloudconfig
. - Run
Repair-AksHciCerts
. - Run
Get-AksHciCluster
to check that it's fixed.
Target cluster pod logs are not accessible - remote error: tls: internal error
The target cluster logs are not accessible. When you try to access pod logs in the target cluster, the following TLS error is displayed:
Error from server: Get "[https://10.0.0.0:10250/containerLogs/kube-system/kube-apiserver-moc-l9iv8xjn3av/kube-apiserver":](https://10.0.0.0:10250/containerLogs/kube-system/kube-apiserver-moc-l9iv8xjn3av/kube-apiserver%22:) remote error: tls: internal error
Note
This is a known issue in AksHci version 1.0.14.x and earlier. It is fixed as part of the 1.0.14.x release (September release and later). Target clusters updated to this version should not experience this issue.
To reproduce
- Install AksHci.
- Install the target cluster.
- Don't upgrade the cluster for 60 days.
- Restart the cluster.
Symptoms and mitigation
The target pod logs should not be accessible. Any kubectl
log command run against the target cluster should return with an error message similar to the following:
Error from server: Get "[https://10.0.0.0:10250/containerLogs/kube-system/kube-apiserver-moc-l9iv8xjn3av/kube-apiserver":](https://10.0.0.0:10250/containerLogs/kube-system/kube-apiserver-moc-l9iv8xjn3av/kube-apiserver%22:) remote error: tls: internal error
To fix the issue:
Run the following command to renew the generated certificate manually:
Update-AksHciClusterCertificates -Name my-workload -fixKubeletCredentials
Generate new credentials:
Get-AksHciCredential -name <clustername>
Cluster controlplane - certificate expired - Unable to connect to the server: x509
The target cluster is not accessible when control plane certificates fail to renew. When trying to reach the cluster, the kubectl
command generates the following error:
certificate expired - Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-07-26T12:24:15-04:00 is after 2022-07-15T15:01:07Z
Note
This issue is fixed in the September 2022 release and later.
To reproduce
- Install AksHci.
- Install the target cluster.
- Turn off cluster(vms) for more than 4 days.
- Turn the cluster on again.
Symptoms and mitigation
The target cluster should be unreachable. Any kubectl
command run against the target cluster should return with an error message similar to the following:
certificate expired - Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-07-26T12:24:15-04:00 is after 2022-07-15T15:01:07Z
To fix the issue:
Run the following command to renew the generated certificate manually:
Update-AksHciClusterCertificates -Name my-workload -cluster -fixKubeletCredentials
Generate new credentials:
Get-AksHciCredential -name <clustername>
After a few minutes, try the kubectl
command again to see if the cluster is now available.
KMS pod fails and the KMS pod logs contain errors
Some possible symptoms of this issue are:
kubectl get secrets
fails with an internal error.kubectl logs <kmspod-name> -n kube-system
contains errors.- Mounting secrets fails in volumes when you try to create pods.
- The apiserver fails to start.
Look in the KMS pod logs for errors by running the following command:
kubectl logs <kmspod-name> -n kube-system
If the logs return an error regarding an invalid token in the management cluster KMS pod, run the following command:
Update-AksHciCertificates
If there is an error regarding an invalid token in the target cluster KMS pod, run the following command:
UpdateAksHciClusterCertificates -name <cluster-name> -fixcloudcredential
Error 'System.Collections.Hashtable.generic_non_zero 1 [Error: Certificate has expired: Expired]'
The mocctl certificate expires if it's not used for more than 60 days. AKS Arc uses the mocctl
command-line tool to communicate with MocStack for performing Moc-related operations. The certificate that the mocclt
command uses to communicate with cloudagent expires in 60 days. The mocctl
command renews the certificate automatically when it's used close to its expiration (after ~42 days). If the command is not used frequently, the certificate expires.
To reproduce the behavior, install AKS Arc, and no operation involving the mocctl
command is performed for 60 days.
To fix the issue, log in again once the certificate expires. Execute the following PowerShell command to log in:
Repair-MocLogin
Delete KVA certificate if expired after 60 days
KVA certificate expires after 60 days if no upgrade is performed.
Update-AksHci
and any command involving kvactl
will throw the following error.
Error: failed to get new provider: failed to create azurestackhci session: Certificate has expired: Expired
To resolve this error, delete the expired certificate file at \kvactl\cloudconfig
and try Update-AksHci
again on the node seeing the certificate expiry issue. You can use the following command:
$env:UserProfile.wssd\kvactl\cloudconfig
You can find a discussion about the issue at KVA certificate needs to be deleted if KVA Certificate expired after 60 days
Special Active Directory permissions are needed for domain joined Azure Stack HCI nodes
Users deploying and configuring Azure Kubernetes Service on Azure Stack HCI need to have Full Control permission to create AD objects in the Active Directory container the server and service objects are created in.
Elevate the user's permissions.
Uninstall-AksHciAdAuth fails with the error '[Error from server (NotFound): secrets "keytab-akshci-scale-reliability" not found]'
If Uninstall-AksHciAdAuth displays this error, you should ignore it for now as this issue will be fixed.
This issue will be fixed.
kubectl logs return "error: You must be logged in to the server (the server has asked for the client to provide credentials)"
There is an issue with AKS Arc in which a cluster can stop returning logs. When this happens, running kubectl logs <pod_name>
returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)". AKS Arc rotates core Kubernetes certificates every 4 days, but sometimes the Kubernetes API server doesn't immediately reload its client certificate for communication with kubelet when the certificates update.
To mitigate the issue, there are several options:
Rerun
kubectl logs
. For example, run the following PowerShell command:while (1) {kubectl logs <POD_NAME>; sleep 1}
Restart the
kube-apiserver
container on each of the control planes for a cluster. Restarting the API server does not impact running workloads. To restart the API server, follow these steps:Get the IP addresses for each control plane in your cluster:
kubectl get nodes -o wide
Run the following command:
ssh -i (get-akshciconfig).Moc.sshPrivateKey clouduser@<CONTROL_PLANE_IP> 'sudo crictl stop $(sudo crictl ps --name kube-apiserver -o json | jq -r .containers[0].id)'
Optionally, but not recommended for production workloads, you can ask
kube-apiserver
not to verify the server certificate of the kubelet:kubectl logs <POD_NAME> --insecure-skip-tls-verify-backend=true