Troubleshoot bare metal machine issues using the az networkcloud baremetalmachine run-data-extract
command
There may be situations where a user needs to investigate and resolve issues with an on-premises bare metal machine. Azure Operator Nexus provides a prescribed set of data extract commands via az networkcloud baremetalmachine run-data-extract
. These commands enable users to get diagnostic data from a bare metal machine.
The command produces an output file containing the results of the data extract located in the Cluster Manager's Azure Storage Account.
Prerequisites
- This article assumes that you've installed the Azure command line interface and the
networkcloud
command line interface extension. For more information, see How to Install CLI Extensions. - The target bare metal machine is on and has readyState set to True.
- The syntax for these commands is based on the 0.3.0+ version of the
az networkcloud
CLI. - Get the Cluster Managed Resource group name (cluster_MRG) that you created for Cluster resource.
Verify Storage Account access
Verify you have access to the Cluster Manager's storage account
- From Azure portal, navigate to Cluster Manager's Storage account.
- In the Storage account details, select Storage browser from the navigation menu on the left side.
- In the Storage browser details, select on Blob containers.
- If you encounter a
403 This request is not authorized to perform this operation.
while accessing the storage account, storage account’s firewall settings need to be updated to include the public IP address. - Request access by creating a support ticket via Portal on the Cluster Manager resource. Provide the public IP address that requires access.
Executing a run command
The run data extract command executes one or more predefined scripts to extract data from a bare metal machine.
Warning
Microsoft does not provide or support any Operator Nexus API calls that expect plaintext username and/or password to be supplied. Please note any values sent will be logged and are considered exposed secrets, which should be rotated and revoked. The Microsoft documented method for securely using secrets is to store them in an Azure Key Vault, if you have specific questions or concerns please submit a request via the Azure Portal.
The current list of supported commands are
SupportAssist/TSR collection for Dell troubleshooting
Command Name:hardware-support-data-collection
Arguments: Type of logs requestedSysInfo
- System InformationTTYLog
- Storage TTYLog dataDebug
- debug logs
Collect Microsoft Defender for Endpoints (MDE) agent information
Command Name:mde-agent-information
Arguments: NoneCollect MDE diagnostic support logs
Command Name:mde-support-diagnostics
Arguments: NoneCollect Dell Hardware Rollup Status
Command Name:hardware-rollup-status
Arguments: NoneGenerate Cluster CVE Report
Command Name:cluster-cve-report
Arguments: None
The command syntax is:
az networkcloud baremetalmachine run-data-extract --name "<machine-name>" \
--resource-group "<cluster_MRG>" \
--subscription "<subscription>" \
--commands '[{"arguments":["<arg1>","<arg2>"],"command":"<command1>"}]' \
--limit-time-seconds "<timeout>"
Specify multiple commands using json format in --commands
option. Each command
specifies command and arguments. For a command with multiple arguments, provide as a list to the arguments
parameter. See Azure CLI Shorthand for instructions on constructing the --commands
structure.
These commands can be long running so the recommendation is to set --limit-time-seconds
to at least 600 seconds (10 minutes). The Debug
option or running multiple extracts might take longer than 10 minutes.
In the response, the operation performs asynchronously and returns an HTTP status code of 202. See the Viewing the Output section for details on how to track command completion and view the output file.
Hardware Support Data Collection
This example executes the hardware-support-data-collection
command and get SysInfo
and TTYLog
logs from the Dell Server. The script executes a racadm supportassist collect
command on the designated baremetal machine. The resulting tar.gz file contains the zipped extract command file outputs in hardware-support-data-<timestamp>.zip
.
az networkcloud baremetalmachine run-data-extract --name "bareMetalMachineName" \
--resource-group "cluster_MRG" \
--subscription "subscription" \
--commands '[{"arguments":["SysInfo", "TTYLog"],"command":"hardware-support-data-collection"}]' \
--limit-time-seconds 600
hardware-support-data-collection
Output
====Action Command Output====
Executing hardware-support-data-collection command
Getting following hardware support logs: SysInfo,TTYLog
Job JID_814372800396 is running, waiting for it to complete ...
Job JID_814372800396 Completed.
---------------------------- JOB -------------------------
[Job ID=JID_814372800396]
Job Name=SupportAssist Collection
Status=Completed
Scheduled Start Time=[Not Applicable]
Expiration Time=[Not Applicable]
Actual Start Time=[Thu, 13 Apr 2023 20:54:40]
Actual Completion Time=[Thu, 13 Apr 2023 20:59:51]
Message=[SRV088: The SupportAssist Collection Operation is completed successfully.]
Percent Complete=[100]
----------------------------------------------------------
Deleting Job JID_814372800396
Collection successfully exported to /hostfs/tmp/runcommand/hardware-support-data-2023-04-13T21:00:01.zip
================================
Script execution result can be found in storage account:
https://cm2p9bctvhxnst.blob.core.windows.net/bmm-run-command-output/dd84df50-7b02-4d10-a2be-46782cbf4eef-action-bmmdataextcmd.tar.gz?se=2023-04-14T01%3A00%3A15Zandsig=ZJcsNoBzvOkUNL0IQ3XGtbJSaZxYqmtd%2BM6rmxDFqXE%3Dandsp=randspr=httpsandsr=bandst=2023-04-13T21%3A00%3A15Zandsv=2019-12-12
Example list of hardware support files collected
Archive: TSR20240227164024_FM56PK3.pl.zip
creating: tsr/hardware/
creating: tsr/hardware/spd/
creating: tsr/hardware/sysinfo/
creating: tsr/hardware/sysinfo/inventory/
inflating: tsr/hardware/sysinfo/inventory/sysinfo_CIM_BIOSAttribute.xml
inflating: tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml
inflating: tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml
inflating: tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml
inflating: tsr/hardware/sysinfo/inventory/sysinfo_CIM_Capabilities.xml
inflating: tsr/hardware/sysinfo/inventory/sysinfo_CIM_StatisticalData.xml
creating: tsr/hardware/sysinfo/lcfiles/
inflating: tsr/hardware/sysinfo/lcfiles/lclog_0.xml.gz
inflating: tsr/hardware/sysinfo/lcfiles/curr_lclog.xml
creating: tsr/hardware/psu/
creating: tsr/hardware/idracstateinfo/
inflating: tsr/hardware/idracstateinfo/avc.log
extracting: tsr/hardware/idracstateinfo/avc.log.persistent.1
[..snip..]
Collect MDE Agent Information
Data is collected with the mde-agent-information
command and formatted as JSON
to /hostfs/tmp/runcommand/mde-agent-information.json
. The JSON file is found
in the data extract zip file located in the storage account. The script executes a
sequence of mdatp
commands on the designated baremetal machine.
This example executes the mde-agent-information
command without arguments.
az networkcloud baremetalmachine run-data-extract --name "bareMetalMachineName" \
--resource-group "cluster_MRG" \
--subscription "subscription" \
--commands '[{"command":"mde-agent-information"}]' \
--limit-time-seconds 600
mde-agent-information
Output
====Action Command Output====
Executing mde-agent-information command
MDE agent is running, proceeding with data extract
Getting MDE agent information for bareMetalMachine
Writing to /hostfs/tmp/runcommand
================================
Script execution result can be found in storage account:
https://cmzhnh6bdsfsdwpbst.blob.core.windows.net/bmm-run-command-output/f5962f18-2228-450b-8cf7-cb8344fdss63b0-action-bmmdataextcmd.tar.gz?se=2023-07-26T19%3A07%3A22Z&sig=X9K3VoNWRFP78OKqFjvYoxubp65BbNTq%2BGnlHclI9Og%3D&sp=r&spr=https&sr=b&st=2023-07-26T15%3A07%3A22Z&sv=2019-12-12
Example JSON object collected
{
"diagnosticInformation": {
"realTimeProtectionStats": $real_time_protection_stats,
"eventProviderStats": $event_provider_stats
},
"mdeDefinitions": $mde_definitions,
"generalHealth": $general_health,
"mdeConfiguration": $mde_config,
"scanList": $scan_list,
"threatInformation": {
"list": $threat_info_list,
"quarantineList": $threat_info_quarantine_list
}
}
Collect MDE Support Diagnostics
Data collected from the mde-support-diagnostics
command uses the MDE Client Analyzer tool to bundle information from mdatp
commands and relevant log files. The storage account tgz
file will contain a zip
file named mde-support-diagnostics-<hostname>.zip
. The zip
should be sent along with any support requests to ensure the supporting teams can use the logs for troubleshooting and root cause analysis, if needed.
This example executes the mde-support-diagnostics
command without arguments.
az networkcloud baremetalmachine run-data-extract --name "bareMetalMachineName" \
--resource-group "cluster_MRG" \
--subscription "subscription" \
--commands '[{"command":"mde-support-diagnostics"}]' \
--limit-time-seconds 600
mde-support-diagnostics
Output
====Action Command Output====
Executing mde-support-diagnostics command
[2024-01-23 16:07:37.588][INFO] XMDEClientAnalyzer Version: 1.3.2
[2024-01-23 16:07:38.367][INFO] Top Command output: [/tmp/top_output_2024_01_23_16_07_37mel0nue0.txt]
[2024-01-23 16:07:38.367][INFO] Top Command Summary: [/tmp/top_summary_2024_01_23_16_07_370zh7dkqn.txt]
[2024-01-23 16:07:38.367][INFO] Top Command Outliers: [/tmp/top_outlier_2024_01_23_16_07_37aypcfidh.txt]
[2024-01-23 16:07:38.368][INFO] [MDE Diagnostic]
[2024-01-23 16:07:38.368][INFO] Collecting MDE Diagnostic
[2024-01-23 16:07:38.613][WARNING] mde is not running
[2024-01-23 16:07:41.343][INFO] [SLEEP] [3sec] waiting for agent to create diagnostic package
[2024-01-23 16:07:44.347][INFO] diagnostic package path: /var/opt/microsoft/mdatp/wdavdiag/5b1edef9-3b2a-45c1-a45d-9e7e4b6b869e.zip
[2024-01-23 16:07:44.347][INFO] Successfully created MDE diagnostic zip
[2024-01-23 16:07:44.348][INFO] Adding mde_diagnostic.zip to report directory
[2024-01-23 16:07:44.348][INFO] Collecting MDE Health
[...snip...]
================================
Script execution result can be found in storage account:
https://cmmj627vvrzkst.blob.core.windows.net/bmm-run-command-output/7c5557b9-b6b6-a4a4-97ea-752c38918ded-action-bmmdataextcmd.tar.gz?se=2024-01-23T20%3A11%3A32Z&sig=9h20XlZO87J7fCr0S1234xcyu%2Fl%2BVuaDh1BE0J6Yfl8%3D&sp=r&spr=https&sr=b&st=2024-01-23T16%3A11%3A32Z&sv=2019-12-12
After downloading the execution result file, the support files can be unzipped for analysis.
Example list of information collected by the MDE Client Analyzer
Archive: mde-support-diagnostics-rack1compute02.zip
inflating: mde_diagnostic.zip
inflating: process_information.txt
inflating: auditd_info.txt
inflating: auditd_log_analysis.txt
inflating: auditd_logs.zip
inflating: ebpf_kernel_config.txt
inflating: ebpf_enabled_func.txt
inflating: ebpf_syscalls.zip
inflating: ebpf_raw_syscalls.zip
inflating: messagess.zip
inflating: conflicting_processes_information.txt
[...snip...]
Hardware Rollup Status
Data is collected with the hardware-rollup-status
command and formatted as JSON to /hostfs/tmp/runcommand/rollupStatus.json
. The JSON file is found
in the data extract zip file located in the storage account. The data collected will show the health of the machine subsystems.
This example executes the hardware-rollup-status
command without arguments.
az networkcloud baremetalmachine run-data-extract --name "bareMetalMachineName" \
--resource-group "clusete_MRG" \
--subscription "subscription" \
--commands '[{"command":"hardware-rollup-status"}]' \
--limit-time-seconds 600
hardware-rollup-status
Output
====Action Command Output====
Executing hardware-rollup-status command
Getting rollup status logs for b37dev03a1c002
Writing to /hostfs/tmp/runcommand
================================
Script execution result can be found in storage account:
https://cmkfjft8twwpst.blob.core.windows.net/bmm-run-command-output/20b217b5-ea38-4394-9db1-21a0d392eff0-action-bmmdataextcmd.tar.gz?se=2023-09-19T18%3A47%3A17Z&sig=ZJcsNoBzvOkUNL0IQ3XGtbJSaZxYqmtd%3D&sp=r&spr=https&sr=b&st=2023-09-19T14%3A47%3A17Z&sv=2019-12-12
Example JSON Collected
{
"@odata.context" : "/redfish/v1/$metadata#DellRollupStatusCollection.DellRollupStatusCollection",
"@odata.id" : "/redfish/v1/Systems/System.Embedded.1/Oem/Dell/DellRollupStatus",
"@odata.type" : "#DellRollupStatusCollection.DellRollupStatusCollection",
"Description" : "A collection of DellRollupStatus resource",
"Members" :
[
{
"@odata.context" : "/redfish/v1/$metadata#DellRollupStatus.DellRollupStatus",
"@odata.id" : "/redfish/v1/Systems/System.Embedded.1/Oem/Dell/DellRollupStatus/iDRAC.Embedded.1_0x23_SubSystem.1_0x23_Current",
"@odata.type" : "#DellRollupStatus.v1_0_0.DellRollupStatus",
"CollectionName" : "CurrentRollupStatus",
"Description" : "Represents the subcomponent roll-up statuses.",
"Id" : "iDRAC.Embedded.1_0x23_SubSystem.1_0x23_Current",
"InstanceID" : "iDRAC.Embedded.1#SubSystem.1#Current",
"Name" : "DellRollupStatus",
"RollupStatus" : "Ok",
"SubSystem" : "Current"
},
{
"@odata.context" : "/redfish/v1/$metadata#DellRollupStatus.DellRollupStatus",
"@odata.id" : "/redfish/v1/Systems/System.Embedded.1/Oem/Dell/DellRollupStatus/iDRAC.Embedded.1_0x23_SubSystem.1_0x23_Voltage",
"@odata.type" : "#DellRollupStatus.v1_0_0.DellRollupStatus",
"CollectionName" : "VoltageRollupStatus",
"Description" : "Represents the subcomponent roll-up statuses.",
"Id" : "iDRAC.Embedded.1_0x23_SubSystem.1_0x23_Voltage",
"InstanceID" : "iDRAC.Embedded.1#SubSystem.1#Voltage",
"Name" : "DellRollupStatus",
"RollupStatus" : "Ok",
"SubSystem" : "Voltage"
},
[..snip..]
Generate Cluster CVE Report
Vulnerability data is collected with the cluster-cve-report
command and formatted as JSON to {year}-{month}-{day}-nexus-cluster-vulnerability-report.json
. The JSON file is found in the data extract zip file located in the storage account. The data collected will include vulnerability data per container image in the cluster.
This example executes the cluster-cve-report
command without arguments.
Note
The target machine must be a control-plane node or the action will not execute.
az networkcloud baremetalmachine run-data-extract --name "bareMetalMachineName" \
--resource-group "cluster_MRG" \
--subscription "subscription" \
--commands '[{"command":"cluster-cve-report"}]' \
--limit-time-seconds 600
cluster-cve-report
Output
====Action Command Output====
Nexus cluster vulnerability report saved.
================================
Script execution result can be found in storage account:
https://cmkfjft8twwpst.blob.core.windows.net/bmm-run-command-output/20b217b5-ea38-4394-9db1-21a0d392eff0-action-bmmdataextcmd.tar.gz?se=2023-09-19T18%3A47%3A17Z&sig=ZJcsNoBzvOkUNL0IQ3XGtbJSaZxYqmtd%3D&sp=r&spr=https&sr=b&st=2023-09-19T14%3A47%3A17Z&sv=2019-12-12
CVE Report Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Vulnerability Report",
"type": "object",
"properties": {
"metadata": {
"type": "object",
"properties": {
"dateRetrieved": {
"type": "string",
"format": "date-time",
"description": "The date and time when the data was retrieved."
},
"platform": {
"type": "string",
"description": "The name of the platform."
},
"resource": {
"type": "string",
"description": "The name of the resource."
},
"runtimeVersion": {
"type": "string",
"description": "The version of the runtime."
},
"managementVersion": {
"type": "string",
"description": "The version of the management software."
},
"vulnerabilitySummary": {
"type": "object",
"properties": {
"criticalCount": {
"type": "integer",
"description": "Number of critical vulnerabilities."
},
"highCount": {
"type": "integer",
"description": "Number of high severity vulnerabilities."
},
"mediumCount": {
"type": "integer",
"description": "Number of medium severity vulnerabilities."
},
"lowCount": {
"type": "integer",
"description": "Number of low severity vulnerabilities."
},
"noneCount": {
"type": "integer",
"description": "Number of vulnerabilities with no severity."
},
"unknownCount": {
"type": "integer",
"description": "Number of vulnerabilities with unknown severity."
}
},
"required": ["criticalCount", "highCount", "mediumCount", "lowCount", "noneCount", "unknownCount"]
}
},
"required": ["dateRetrieved", "platform", "resource", "runtimeVersion", "managementVersion", "vulnerabilitySummary"]
},
"containers": {
"type": "object",
"additionalProperties": {
"type": "array",
"items": {
"type": "object",
"properties": {
"namespace": {
"type": "string",
"description": "The namespace of the container."
},
"digest": {
"type": "string",
"description": "The digest of the container image."
},
"os": {
"type": "object",
"properties": {
"family": {
"type": "string",
"description": "The family of the operating system."
}
},
"required": ["family"]
},
"summary": {
"type": "object",
"properties": {
"criticalCount": {
"type": "integer",
"description": "Number of critical vulnerabilities in this container."
},
"highCount": {
"type": "integer",
"description": "Number of high severity vulnerabilities in this container."
},
"lowCount": {
"type": "integer",
"description": "Number of low severity vulnerabilities in this container."
},
"mediumCount": {
"type": "integer",
"description": "Number of medium severity vulnerabilities in this container."
},
"noneCount": {
"type": "integer",
"description": "Number of vulnerabilities with no severity in this container."
},
"unknownCount": {
"type": "integer",
"description": "Number of vulnerabilities with unknown severity in this container."
}
},
"required": ["criticalCount", "highCount", "lowCount", "mediumCount", "noneCount", "unknownCount"]
},
"vulnerabilities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Title of the vulnerability."
},
"vulnerabilityID": {
"type": "string",
"description": "Identifier of the vulnerability."
},
"fixedVersion": {
"type": "string",
"description": "The version in which the vulnerability is fixed."
},
"installedVersion": {
"type": "string",
"description": "The currently installed version."
},
"referenceLink": {
"type": "string",
"format": "uri",
"description": "Link to the vulnerability details."
},
"publishedDate": {
"type": "string",
"format": "date-time",
"description": "The date when the vulnerability was published."
},
"score": {
"type": "number",
"description": "The CVSS score of the vulnerability."
},
"severity": {
"type": "string",
"description": "The severity level of the vulnerability."
},
"resource": {
"type": "string",
"description": "The resource affected by the vulnerability."
},
"target": {
"type": "string",
"description": "The target of the vulnerability."
},
"packageType": {
"type": "string",
"description": "The type of the package."
},
"exploitAvailable": {
"type": "boolean",
"description": "Indicates if an exploit is available for the vulnerability."
}
},
"required": ["title", "vulnerabilityID", "fixedVersion", "installedVersion", "referenceLink", "publishedDate", "score", "severity", "resource", "target", "packageType", "exploitAvailable"]
}
}
},
"required": ["namespace", "digest", "os", "summary", "vulnerabilities"]
}
}
}
},
"required": ["metadata", "containers"]
}
CVE Data Details
The CVE data is refreshed per container image every 24-hours based on Kubernetes resource instantiation or whenever there is a change to the Kubernetes resource referencing the image (whichever occurs first).
Viewing the Output
Note the provided link to the tar.gz zipped file from the command execution. The tar.gz file name identifies the file in the Storage Account of the Cluster Manager resource group. You can also use the link to directly access the output zip file. The tar.gz file also contains the zipped extract command file outputs. Download the output file from the storage blob to a local directory by specifying the directory path in the optional argument --output-directory
.
Note: Storage Account could be locked resulting in 403 This request is not authorized to perform this operation.
due to networking or firewall restrictions. Refer Verify Storage Account access for procedure to verify/request access.