Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article explains how to replace a faulty or underperforming network device in Azure Operator Nexus Network Fabric (NNF). It covers devices such as the Top of Rack (TOR) switch, Customer Edge (CE) switch, Network Packet Broker (NPB), and the Management Switch. The replacement is performed using the Return Material Authorization (RMA) process. This process is designed to minimize service disruption and safely reintegrate the new hardware into the fabric.
Scenarios for device replacement
Device replacement may be required in the following situations:
Inconsistent Performance (Flakiness): The device shows intermittent connectivity or performance degradation.
Hardware Failure: The device experiences critical hardware malfunctions that can't be fixed through standard troubleshooting.
Persistent Unreachability: The device is permanently unreachable despite repeated recovery attempts.
Prerequisites
To ensure a smooth and timely RMA process, verify the following prerequisites before initiating deployment:
Azure CLI is installed and properly configured
Permissions are granted to manage Microsoft.ManagedNetworkFabric resources
Replacement device is powered on and physically connected
Replacement device supports Zero Touch Provisioning (ZTP)
To prevent failure during the device disable action if the device is affected by continuous reboots due to hardware issues, it is advised to power off the device prior to initiating the RMA process.
Before initiating the RMA deployment, perform the following checks:
Interface Speed Validation
Confirm that the ma1 interface speed is set to 100 Mbps or higher.
If the speed is below 100 Mbps, update it accordingly to prevent delays or potential timeouts during the RMA process.
Device Storage Check
Ensure the device has a minimum of 3 GB of free space available.
This action is required to successfully download and stage the necessary image files.
Device types supported
- Customer Edge (CE)
- Top of Rack (TOR)
- Management Switch (Mgmt Switch)
- Network Packet Broker (NPB)
Note
Please note that this workflow supports RMA for only one device at a time through POST actions, which accept input related to a single device per request.
Steps to replace a device
Step 1: Disable administrative state
Use the following command to disable the administrative state of the device:
az networkfabric device update-admin-state \
--state Disable \
--resource-name "nf-device-name" \
--resource-group "resource-group-name"
This action sets the following states:
Device Administrative State: Disabled
Fabric Administrative State: EnabledDegraded
Note
This action is not permitted by the service, if any of the following operations are in progress at the fabric level:
- Device upgrade
- Configuration push
- Secret or certificate updates
- Administrative lock
- Terminal Server (TS) reprovisioning.
Step 2: Update the serial number
Execution conditions:
- Device Administrative State must be
Disabled - Fabric Administrative State must be
EnabledDegraded
Once the replacement device is physically installed, update its serial number in the fabric resource:
az networkfabric device update \
--serial-number "replacement-serial-number" \
--resource-name "nf-device-name" \
--resource-group "resource-group-name"
Error recovery guidance:
If RMA fails due to an incorrect serial number, repatching is allowed without a support ticket.
If validation fails after device bootstrap, the system returns the status: Device Unable to Boot Up - Failed.
This action performs the following tasks:
Update serial number stored in Azure ARM resource
Keeps the device in
Disabledstate and Fabric Administrative State inEnabledDegraded
Note
The expected format for the serial number should be: <Manufacturer;Model;Hardware Version;Serial Number>
For example, "Arista;DCS-7280DR3-XX;12.05;ABC23XXXXXX"
Step 3: Ensure device is in ZTP Mode
Verify that the replacement device is in ZTP mode. If not, configure the device for ZTP before continuing.
Note
ZTP enables automatic configuration retrieval during the RMA process.
Step 4: Initiate RMA process
Initiate the RMA process using the following command:
az networkfabric device update-admin-state \
--state RMA \
--resource-name "nf-device-name" \
--resource-group "resource-group-name"
Network Fabric Controller pushes all required configuration files to the new replaced device. It is advised to retry the operation if there's transient failures until success is confirmed.
The device boots into its base configuration using the maintenance profile. This condition applies only to TOR and CE device types.
This action sets the following states:
Device Administrative State: UnderMaintenance
Fabric Administrative State: EnabledDegraded
Step 5: Refresh configuration
This operation pushes the latest configuration to the device (for all type of the devices). If a maintenance profile is already configured on the device (applicable to CE and TOR), it will be retained during this operation.
az networkfabric device refresh-configuration --resource-name <resource-name> --resource-group <rg-name>
This action keeps the device in following states:
Device Administrative State: UnderMaintenance
Fabric Administrative State: EnabledDegraded
Step 6: Enable administrative state.
Once configuration is applied successfully, bring the device back into active service:
az networkfabric device update-admin-state \
--state Enable \
--resource-name "nf-device-name" \
--resource-group "resource-group-name"
This action sets the following state once it's fully healthy and synchronized with the fabric:
Device Administrative State:
EnabledFabric Administrative State:
Enabled
Note
In a given fabric if there are any other device is in Disabled state then the Fabric Administrative State will maintained as : EnabledDegraded
Summary
The RMA workflow in Network Fabric ensures seamless device replacement with controlled state transitions and full configuration synchronization. This helps maintain service continuity and operational consistency across the network.
Permitted/Non-Permitted Actions When Fabric is in Enabled Degraded State
When the fabric is in an Enabled Degraded State, certain operations are permitted while others should be deferred or handled with caution. Please refer to the below list:
Following are the Permitted Operations
| Operation Category | Examples (APIs / CLI) | Allowed? | Notes / Recommended Practice |
|---|---|---|---|
| READ (non-mutating) | GET/List, Show for Fabric / Devices / ISDs / Networks; metrics & health | Allowed | Safe to monitor state, validate results, and track onboarding |
| RMA Device Replacement Actions | Disable + Update Serial + RMA + Refresh Config + Enable | Allowed | Follow standard Replace Device guide steps |
| Commit (configuration apply) | Start / Monitor Commit (Commit Workflow v2) | Allowed | Configurations pushed to all devices except those in Disabled state |
| VALIDATE (pre-flight checks) | Validate configuration / dry-runs | Allowed | Useful to catch issues before commit |
| Administrative Lock / Unlock | Lock / Unlock fabric | Allowed | No restrictions in this state |
Following Operations are Technically Allowed but Recommended to Defer
| Operation Category | Examples (APIs / CLI) | Allowed? | Notes / Recommended Practice |
|---|---|---|---|
| CREATE/UPDATE config (non-RMA config) | Add/Change ISDs, Networks, Route Policies, Prefs/vias, Taps, Communities | Technically allowed but defer if possible | Configuration won’t reach Disabled devices until RMA completes. Once device RMA completed all latest configuration will be pushed to devices. |
| DELETE (fabric config) | Remove ISDs, Networks, Policies, Taps | Technically allowed but defer if possible | Disabled devices may retain removed config until re-enabled. |
Non-permitted Operations
| Operation Category | Examples (APIs/CLI) | Allowed? | Notes / Recommended Practice |
|---|---|---|---|
| Upgrades | Fabric/Device runtime upgrades | Not permitted | Schedule upgrades after RMA completes and fabric returns to Enabled |
| Secret rotation | Geneva action | Not permitted | TS reprovisioning and device RMA would be treated as mutually exclusive operations. If one is active the other cannot be initiated. |