Manage lifecycle of Bare Metal Machines
This article describes how to perform lifecycle management operations on Bare Metal Machines (BMM). These steps should be used for troubleshooting purposes to recover from failures or when taking maintenance actions. The commands to manage the lifecycle of the BMM include:
- Power off the BMM
- Start the BMM
- Restart the BMM
- Make the BMM unschedulable or schedulable
- Reimage the BMM
- Replace the BMM
Prerequisites
- Install the latest version of the appropriate CLI extensions
- Get the name of the resource group for the BMM
- Get the name of the bare metal machine that requires a lifecycle management operation
- Ensure that the target bare metal machine
poweredState
set toOn
andreadyState
set toTrue
- This prerequisite is not applicable for the
start
command
- This prerequisite is not applicable for the
Caution
Actions against management servers should not be run without consultation with Microsoft support personnel. Doing so could affect the integrity of the Operator Nexus Cluster.
Power off the BMM
This command will power-off
the specified bareMetalMachineName
.
az networkcloud baremetalmachine power-off \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName"
Start the BMM
This command will start
the specified bareMetalMachineName
.
az networkcloud baremetalmachine start \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName"
Restart the BMM
This command will restart
the specified bareMetalMachineName
.
az networkcloud baremetalmachine restart \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName"
Make a BMM unschedulable (cordon)
You can make a BMM unschedulable by executing the cordon
command.
On the execution of the cordon
command,
Operator Nexus workloads aren't scheduled on the BMM when cordon is set; any attempt to create a workload on a cordoned
BMM results in the workload being set to pending
state. Existing workloads continue to run.
The cordon command supports an evacuate
parameter with the default False
value.
On executing the cordon
command, with the value True
for the evacuate
parameter, the workloads that are running on the BMM are stopped
and the BMM is set to pending
state.
az networkcloud baremetalmachine cordon \
--evacuate "True" \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName"
The evacuate "True"
removes workloads from that node while evacuate "False"
only prevents the scheduling of new workloads.
Make a BMM schedulable (uncordon)
You can make a BMM schedulable
(usable) by executing the uncordon
command. All workloads in a pending
state on the BMM are restarted
when the BMM is uncordoned
.
az networkcloud baremetalmachine uncordon \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName"
Reimage a BMM
You can restore the runtime version on a BMM by executing reimage
command. This process redeploys the runtime image on the target BMM and executes the steps to rejoin the cluster with the same identifiers. This action doesn't impact the tenant workload files on this BMM.
As a best practice, make sure the BMM's workloads are drained using the cordon
command, with evacuate "True"
, prior to executing the reimage
command.
Warning
Running more than one baremetalmachine replace or reimage command at the same time, or running a replace at the same time as a reimage, will leave servers in a nonworking state. Make sure one replace / reimage has fully completed before starting another one. In a future release, we plan to either add the ability to replace multiple servers at once or have the command return an error when attempting to do so.
az networkcloud baremetalmachine reimage \
–-name "bareMetalMachineName" \
--resource-group "resourceGroupName"
Replace BMM
Use Replace BMM
command when a server has encountered hardware issues requiring a complete or partial hardware replacement. After replacement of components such as motherboard or NIC replacement, the MAC address of BMM will change, however the IDrac IP address and hostname will remain the same.
Warning
Running more than one baremetalmachine replace or reimage command at the same time, or running a replace at the same time as a reimage, will leave servers in a nonworking state. Make sure one replace / reimage has fully completed before starting another one. In a future release, we plan to either add the ability to replace multiple servers at once or have the command return an error when attempting to do so.
az networkcloud baremetalmachine replace \
--name "bareMetalMachineName" \
--resource-group "resourceGroupName" \
--bmc-credentials password="{password}" username="{user}" \
--bmc-mac-address "00:00:4f:00:57:ad" \
--boot-mac-address "00:00:4e:00:58:af" \
--machine-name "name" \
--serial-number "BM1219XXX"
Feedback
Submit and view feedback for