Troubleshoot control plane quorum loss
Follow this troubleshooting guide when multiple control plane nodes are offline or unavailable:
Prerequisites
- Install the latest version of the appropriate Azure CLI extensions.
- Gather the following information:
- Subscription ID
- Cluster name and resource group
- Bare metal machine name
- Ensure you're logged using
az login
Symptoms
- Kubernetes API isn't available
- Multiple control plane nodes are offline or unavailable
Procedure
- Identify the Nexus Management Node
- To identify the management nodes, run
az networkcloud baremetalmachine list -g <ResourceGroup_Name>
- Log in to the identified server
- Ensure the ironic-conductor service is present on this node using
crictl ps -a |grep -i ironic-conductor
Example output:
testuser@<servername> [ ~ ]$ sudo crictl ps -a |grep -i ironic-conductor
<id> <id> 6 hours ago Running ironic-conductor 0 <id>
- Determine the iDRAC IP of the server
Run the command
az networkcloud cluster list -g <RG_Name>
The output of the command is a JSON with the iDRAC IP
{ "bmcConnectionString": "redfish+https://xx.xx.xx.xx/redfish/v1/Systems/System.Embedded.1", "bmcCredentials": { "username": "<username>" }, "bmcMacAddress": "<bmcMacAddress>", "bootMacAddress": "<bootMacAddress", "machineDetails": "extraDetails", "machineName": "<machineName>", "rackSlot": <rackSlot>, "serialNumber": "<serialNumber>" },
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for