Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This how-to guide provides a step-by-step template for upgrading a Nexus Fabric designed to assist users in managing a reproducible end-to-end upgrade through Azure APIs and standard operating procedures. Regular updates are crucial for maintaining system integrity and accessing the latest product improvements.
Overview
Overview of Fabric runtime upgrade template
Runtime bundle components: These components require operator consent for upgrades that may affect traffic behavior or necessitate device reboots. The network fabric's design allows for updates to be applied while maintaining continuous data traffic flow.
Runtime changes are categorized as follows:
- Operating system updates: Necessary to support new features or resolve issues.
- Base configuration updates: Initial settings applied during device bootstrapping.
- Configuration structure updates: Generated based on user input for conf
Prerequisites
Prerequisites for using this template to upgrade a Fabric
- Latest version of Azure CLI.
- Latest
managednetworkfabric
CLI extension. - Latest
networkcloud
CLI extension. - Subscription access to run the Azure Operator Nexus Network Fabric (NF) and Network Cloud (NC) CLI extension commands.
- Target Fabric must be healthy in a running state, with all Devices healthy.
Required Parameters
Parameters used in this document
- <ENVIRONMENT>: - Instance name
- <AZURE_REGION>: - Azure region of instance
- <CUSTOMER_SUB_NAME>: Subscription name
- <CUSTOMER_SUB_ID>: Subscription ID
- <NEXUS_VERSION>: Nexus release version (for example, 2504.1)
- <NNF_VERSION>: Operator Nexus Fabric release version (for example, 8.1)
- <NF_VERSION>: NF runtime version for upgrade (for example, 5.0.0)
- <NFC_NAME>: Associated Network Fabric Controller (NFC)
- <NFC_RG>: NFC Resource Group
- <NFC_RID>: NFC ARM ID
- <NFC_MRG>: NFC Managed Resource Group
- <NF_NAME>: Network Fabric Name
- <NF_RG>: Network Fabric Resource Group
- <NF_RID>: Network Fabric ARM ID
- <NF_DEVICE_NAME>: Network Fabric Device Name
- <NF_DEVICE_RID>: Network Fabric Device Resource ID
- <CM_NAME>: Associated Cluster Manager (CM)
- <CLUSTER_NAME>: Associated Cluster name
- <MISE_CID>: Microsoft.Identity.ServiceEssentials (MISE) Correlation ID in debug output for Device updates
- <CORRELATION_ID>: Operation Correlation ID in debug output for Device updates
- <ASYNC_URL>: Asynchronous (ASYNC) URL in debug output for Device updates
Deployment Data
Deployment data details
- Nexus: <NEXUS_VERSION>
- NC: <NC_VERSION>
- NF: <NF_VERSION>
- Subscription Name: <CUSTOMER_SUB_NAME>
- Subscription ID: <CUSTOMER_SUB_ID>
- Tenant ID: <CUSTOMER_SUB_TENANT_ID>
Debug information for Azure CLI commands
How to collect debug information for Azure CLI commands
Azure CLI deployment commands issued with --debug
contain the following information in the command output:
cli.azure.cli.core.sdk.policies: 'mise-correlation-id': '<MISE_CID>'
cli.azure.cli.core.sdk.policies: 'x-ms-correlation-request-id': '<CORRELATION_ID>'
cli.azure.cli.core.sdk.policies: 'Azure-AsyncOperation': '<ASYNC_URL>'
To view status of long running asynchronous operations, run the following command with az rest
:
az rest -m get -u '<ASYNC_URL>'
Command status information is returned along with detailed informational or error messages:
"status": "Accepted"
"status": "Succeeded"
"status": "Failed"
If any failures occur, report the <MISE_CID>, <CORRELATION_ID>, status code, and detailed messages when opening a support request.
Prechecks
Prechecks before starting Fabric upgrade
The following role permissions should be assigned to end users responsible for Fabric create, upgrade, and delete operations.
These permissions can be granted temporarily, limited to the duration required to perform the upgrade.
- Microsoft.NexusIdentity/identitySets/read
- Microsoft.NexusIdentity/identitySets/write
- Microsoft.NexusIdentity/identitySets/delete
- Ensure that
Role Based Access Control Administrator
is successfully activated. - Check in Azure portal from the following path:
Network Fabrics
->NF_NAME
->Access control (IAM)
->View my access
. - In current 'Role assignments', you should see the following two roles:
- Nexus Contributor
- Role Based Access Control Administrator
Validate the provisioning status for the Network Fabric Controller (NFC), Fabric, and Fabric Devices.
Log in to Azure CLI and select or set the
<CUSTOMER_SUB_ID>
:az login az account set --subscription <CUSTOMER_SUB_ID>
Check that the NFC is in Provisioned state:
az networkfabric controller show -g <NFC_RG> --resource-name <NFC_NAME> --subscription <CUSTOMER_SUB_ID> -o table
Check the NF status:
az networkfabric fabric show -g <NF_RG> --resource-name <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
Record down the
fabricVersion
andprovisioningState
.Check the Devices status.
az networkfabric device list -g <NF_RG> -o table --subscription <CUSTOMER_SUB_ID>
Note
If
provisioningState
is notSucceeded
, stop the upgrade until issues are resolved.Check
Microsoft.NexusIdentity
user Resource Provider (RP) is registered on the customer subscription:az provider show --namespace Microsoft.NexusIdentity -o table --subscription <CUSTOMER_SUB_ID> Namespace RegistrationPolicy RegistrationState ----------------------- -------------------- ------------------- Microsoft.NexusIdentity RegistrationRequired Registered
If not registered, run the following to register:
az provider register --namespace Microsoft.NexusIdentity --wait --subscription <CUSTOMER_SUB_ID> az provider show --namespace Microsoft.NexusIdentity -o table Namespace RegistrationPolicy RegistrationState ----------------------- -------------------- ------------------- Microsoft.NexusIdentity RegistrationRequired Registered
Minimum available disk space on each device must be more than 3.5 GB for a successful device upgrade.
Verify the available space on each Fabric Devices using the following Azure CLI command.
az networkfabric device run-ro --resource-name <NF_DEVICE_NAME> --resource-group <NF_RG> --ro-command "dir flash" --subscription <CUSTOMER_SUB_ID> --debug
Contact Microsoft support if there isn't enough space to perform the upgrade. Archived Extensible Operating System (EOS) images and support bundle files can be removed at the direction of support.
Check the Fabric's Network Packet Broker (NPB) for any orphaned
Network Taps
in Azure portal.- Select
Network Fabrics
underAzure Services
and then select the <NF_NAME>. - Click on the
Resource group
for the Fabric. - In the Resources list, filter on
Network Packet Broker
. - Click on the
Network Packet Broker
name in the list. - Click on
Network Taps
tab on theOverview
screen. - All
Network Taps
should beSucceeded
forConfiguration State
andProvisioning State
. - Look for any Taps with a red
X
, and a status ofNot Found
,Failed
, orError
.
Note
If any Taps show
Not Found
,Failed
, orError
status, stop the upgrade until issues are cleared. Provide this information to Microsoft Support when opening a support ticket for Tap issues.- Select
Run and validate the Fabric cable validation report. Follow Validate Cables for Nexus Network Fabric to set up and run the report
Note
Resolve any connection and cable issues before continuing the upgrade.
Review Operator Nexus Release notes for required checks and configuration updates not included in this document.
Upgrade Procedure
Fabric runtime upgrade procedure details
Verify current Fabric runtime version
How to check current cluster runtime version.
az networkfabric fabric list -g <NF_RG> --query "[].{name:name,fabricVersion:fabricVersion,configurationState:configurationState,provisioningState:provisioningState}" -o table --subscription <CUSTOMER_SUB_ID>
az networkfabric fabric show -g <NF_RG> --resource-name <NF_NAME> --subscription <CUSTOMER_SUB_ID>
Initiate Fabric upgrade
Start the upgrade with the following command:
az networkfabric fabric upgrade -g <NF_RG> --resource-name <NF_NAME> --subscription <CUSTOMER_SUB_ID> --action start --version "5.0.0"
{}
Note
Output showing {}
indicates successful execution of upgrade command.
The Fabric Resource Provider validates if the version upgrade is allowed from the existing Fabric version to the target version. Only N+1 major release upgrades are allowed (for example, 4.0.0->5.0.0).
On successful completion, the command puts the Fabric status into Under Maintenance
and prevents any other operation on the Fabric.
Follow device-specific workflow
Nexus Network Fabric Racks are composed of the following Devices types:
- Customer Edge (CE) Switches
- Management (MGMT) Switches
- Top Of Rack (TOR) Switches
- Network Packet Brokers (NPB)
Eight Rack environments have 30 Devices:
- Aggregate Rack - two CE, two NPB, two MGMT Switches (six Devices)
- Eight Compute Racks - Each Compute Rack has two TOR's and one MGMT Switch (24 Devices)
Four Rack environments have 17 Devices:
- Aggregate Rack - two CE's, one NPB, two MGMT Switches (five Devices)
- Four Compute Racks - Each Compute Rack has two TOR's and one MGMT Switch (12 Devices)
Important
The Devices must be upgraded in the following specific order to maintain networking service during the upgrade.
- Compute Rack odd numbered TOR upgrade together in parallel.
- Compute Rack even numbered TOR upgrade together in parallel.
- Compute Rack MGMT switches upgrade together in parallel.
- Aggregate Rack CEs upgrade one after the other in serial.
Important
After each CE upgrade, wait for a duration of five minutes to ensure that the recovery process is complete before proceeding to the next CE
- Aggregate Rack NPBs upgrade one after the other in serial.
- Aggregate Rack MGMT switches upgrade one after the other in serial.
Note
Wait for successful upgrade on all Devices in a group before moving to the next group.
Follow device-specific upgrade
Run the following command to upgrade the version on each Device:
az networkfabric device upgrade --version <NF_VERSION> -g <NF_RG> --resource-name <NF_DEVICE_NAME> --subscription <CUSTOMER_SUB_ID> --debug
As part of the upgrade, the Devices are put into maintenance mode. The Device drains all traffic and stops advertising routes so that the traffic flow to the device stops. At completion, the Nexus Network Fabric (NNF) service updates the Device resource version property to the new version.
Gather ASYNC URL and Correlation ID info for further troubleshooting if needed.
cli.azure.cli.core.sdk.policies: 'mise-correlation-id': '<MISE_CID>'
cli.azure.cli.core.sdk.policies: 'x-ms-correlation-request-id': '<CORRELATION_ID>'
cli.azure.cli.core.sdk.policies: 'Azure-AsyncOperation': '<ASYNC_URL>'
Provide this information to Microsoft Support when opening a support ticket for upgrade issues.
After Device upgrades are complete, make sure that all the Devices are showing with <NF_VERSION> by running the following command:
az networkfabric device list -g <NF_RG> --query "[].{name:name,version:version}" -o table --subscription <CUSTOMER_SUB_ID>
Complete Network Fabric Upgrade
Once all the Devices are upgraded, run the following command to take the Network Fabric out of maintenance state.
az networkfabric fabric upgrade --action Complete --version <NF_VERSION> -g <NF_RG> --resource-name <NF_NAME> --debug --subscription <CUSTOMER_SUB_ID>
How to troubleshoot Device update failures
- Collect any errors in the Azure CLI output.
- Collect device operation state from Azure portal or Azure CLI.
- Create Azure Support Request for any device upgrade failures and attach any errors along with ASYNC URL, correlation ID, and operation state of Fabric and Devices.
Post-upgrade tasks
Detailed steps for post-upgrade tasks
Review Operator Nexus release notes
Review the Operator Nexus release notes for any version specific actions required post-upgrade.
Validate Nexus Instance
Validate the health and status of all the Nexus Instance resources with the Nexus Instance Readiness Test (IRT).
If not using IRT, perform resource validation of all Nexus Instance components with Azure CLI:
# Check `ProvisioningState = Succeeded` in all resources
# NFC
az networkfabric controller list -g <NFC_RG> --subscription <CUSTOMER_SUB_ID> -o table
az customlocation list -g <NFC_MRG> --subscription <CUSTOMER_SUB_ID> -o table
# Fabric
az networkfabric fabric list -g <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
az networkfabric rack list -g <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
az networkfabric fabric device list -g <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
az networkfabric nni list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
az networkfabric acl list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
az networkfabric l2domain list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
# CM
az networkcloud clustermanager list -g <CM_RG> --subscription <CUSTOMER_SUB_ID> -o table
# Cluster
az networkcloud cluster list -g <CLUSTER_RG> --subscription <CUSTOMER_SUB_ID> -o table
az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> --query "sort_by([]. {name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
az networkcloud storageappliance list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> -o table
# Tenant Workloads
az networkcloud virtualmachine list --sub <CUSTOMER_SUB_ID> --query "reverse(sort_by([?clusterId=='<CLUSTER_RID>'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
az networkcloud kubernetescluster list --sub <CUSTOMER_SUB_ID> --query "[?clusterId=='<CLUSTER_RID>'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
Note
IRT validation provides a complete functional test of networking and workloads across all components of the Nexus Instance. Simple validation does not provide functional testing.
Links
Reference Links for Fabric upgrade
Reference links for Fabric upgrade:
- Access the Azure portal
- Install Azure CLI
- Install CLI Extension
- Reference the Network Fabric Upgrade
- Reference the Nexus Instance Readiness Test (IRT)