Best practices for remote troubleshooting of Azure Sphere devices
Important
This is the Azure Sphere (Legacy) documentation. Azure Sphere (Legacy) is retiring on 27 September 2027, and users must migrate to Azure Sphere (Integrated) by this time. Use the Version selector located above the TOC to view the Azure Sphere (Integrated) documentation.
As you manage your devices remotely, sometimes you may encounter issues that prevent your devices from operating properly. This article includes a list of questions and flowcharts to help you triage your situation and determine what went wrong. Going through this guide can reduce downtime of your devices and help you quickly self-troubleshoot to get your devices back to operating as they should.
Note
Here is a preliminary checklist addressing connectivity infrastructure that you should walk through:
- Ensure your network infrastructure is configured to allow the necessary endpoints for Azure Sphere devices by following the instructions in Azure Sphere's OS networking requirements:
- To confirm the endpoints are properly configured, run the diagnostic checks in Solution design considerations.
- To determine if a device is connecting to Azure Sphere Security Services (AS3), run the command azsphere device list. Check the
lastUpdateRequestUTC
field, which provides the last time the device requested for an update from Azure Sphere Security Services. - If you are running custom NTP, ensure that your NTP server is up and its time is with 24 hours of global time and is set to the correct timezone.
- Check your application's Wi-Fi configuration settings.
- Check IoT Hub:
- Ensure your Azure Sphere Security Service certificate on IoT Hub is up to date.
- Check that IoT Hub servers are operational.
- Check that your devices are receiving enough power per your hardware solution's specifications.
- Check that Microsoft's NCSI service is up and connecting. Refer to the following link: (http://www.msftconnecttest.com/connecttest.txt).
Before checking other aspects of device health, consider the following preliminary questions:
How many devices are impacted? Is this the only device, or are there other devices?
- If a small number of devices are impacted, obtain their device ID and run azsphere tenant download-error-report in the CLI and analyze the report. See Collect and interpret error data for information about how to interpret the report.
- If there are multiple devices, continue onto the next section.
Triage device health
The following are some areas of consideration to help you triage the situation.
Check your devices' connectivity by tracing through the following flowchart:
First, check your firewall settings. If you manage your firewall settings, check that your networking settings are compliant with Sphere's requirements. For more information, see Troubleshoot network problem. Follow the guidance in Azure Sphere OS networking requirements to ensure compliance. If you do not manage your firewall settings, reach out to your firewall administrator for further guidance.
Next, look at northbound connectivity. If you use Wi-Fi to connect to the internet, are your devices in a crowded area? If they are, ensure that your settings are using targeted scan. For more on targeted scan, see WifiConfig_SetTargetedScanEnabled Function. If your devices are not in a crowded area, reach out to Microsoft Support for further guidance. Do you use EAP-TLS? If yes, check with your provider on the lifecycle certificate management and refer to EAP-TLS certificate renewal. If you do not use EAP-TLS, ensure your SSID or password haven't been changed.
If you use cellular to connect to the internet, ask your systems integrators or cellular service provider if your devices are showing up on the network.
What's the scope of the issue? Trace through the following flowchart:
How many devices are encountering problems? If it's just a few devices that are impacted, first, check the Connectivity flowchart. Next, check the physical environment of the devices: Are the devices unplugged or has some change been made on the devices' hardware? If the devices are plugged in and no change has been made on the devices' hardware, get 2 to 3 device IDs and check the tenant error logs by running the command azsphere tenant download-error-report. Check the Description field. If the description includes any of the following, check the customer application logs for further guidance:
- AppCrash
- AppUpdate
- AppExit
However, if the description includes any of the following, reach out to Microsoft Support:
- SystemAppCrash
- Kernel Panic
- Kernel Oops
If all devices have been affected, follow these steps:
- Have devices recently taken an OS update? If they have, contact Microsoft Support. If they haven't taken an OS update, refer to the Connectivity flowchart. Depending on which software channel feed your device group is part of, you may have received an OS update notification. For more information on OS feeds, see Azure Sphere OS feeds.
- Have devices recently taken an application update? If they have, redeploy or rollback to a previous version of the application. If they haven't, contact Microsoft Support. For more information on over-the-air updates, please refer to About over-the-air updates.
In the case that you can get physical access to the devices
If you're able to get physical access to the devices, you may wish to take these local troubleshooting steps:
- Can you rule out connectivity issues at that specific location? For example, is the building having issues with connectivity?
- Check the ethernet section of the Connectivity flowchart: If you use ethernet to connect to the internet, check your switch port. If the switch port is lighting up, power cycle the device. If they are not lighting up, check your firewall settings.
- Are the devices unplugged, or has some change been made on the devices' hardware? For example, are the sensors overexerted, or is the USB connector broken?
- Run the command azsphere get-support-data.