Troubleshoot control plane quorum loss
Follow the steps in this troubleshooting article when multiple control plane nodes are offline or unavailable.
Prerequisites
- Install the latest version of the appropriate Azure CLI extensions.
- Gather the following information:
- Subscription ID
- Cluster name and resource group
- Bare-metal machine name
- Ensure that you're signed in by using
az login
.
Symptoms
- The Kubernetes API isn't available.
- Multiple control plane nodes are offline or unavailable.
Procedure
Identify the Azure Operator Nexus management nodes:
To identify the management nodes, run
az networkcloud baremetalmachine list -g <ResourceGroup_Name>
.Sign in to the identified server.
Ensure that the ironic-conductor service is present on this node by using
crictl ps -a |grep -i ironic-conductor
. Here's example output:testuser@<servername> [ ~ ]$ sudo crictl ps -a |grep -i ironic-conductor <id> <id> 6 hours ago Running ironic-conductor 0 <id>
Determine the integrated Dell remote access controller (iDRAC) IP of the server:
Run the command
az networkcloud cluster list -g <RG_Name>
.The output of the command is JSON with the iDRAC IP.
{ "bmcConnectionString": "redfish+https://xx.xx.xx.xx/redfish/v1/Systems/System.Embedded.1", "bmcCredentials": { "username": "<username>" }, "bmcMacAddress": "<bmcMacAddress>", "bootMacAddress": "<bootMacAddress", "machineDetails": "extraDetails", "machineName": "<machineName>", "rackSlot": <rackSlot>, "serialNumber": "<serialNumber>" },
Access the integrated iDRAC graphical user interface (GUI) by using the IP in your browser to shut down affected management servers.
When all affected management servers are down, turn on the servers by using the iDRAC GUI.
The servers should now be restored.
Related content
- If you still have questions, contact Azure support.
- For more information about support plans, see Azure support plans.