Troubleshoot Azure IoT Operations Preview

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.

See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

This article contains troubleshooting tips for Azure IoT Operations Preview.

Deployment and configuration issues

For general deployment and configuration troubleshooting, you can use the Azure CLI IoT Operations check and support commands.

Azure CLI version 2.46.0 or higher is required and the Azure IoT Operations extension installed.

  • Use az iot ops check to evaluate IoT Operations service deployment for health, configuration, and usability. The check command can help you find problems in your deployment and configuration.

  • Use az iot ops support create-bundle to collect logs and traces to help you diagnose problems. The support create-bundle command creates a standard support bundle zip archive you can review or provide to Microsoft Support.

Linked authorization failed error

If your deployment fails with the "code":"LinkedAuthorizationFailed" error, it means that you don't have Microsoft.Authorization/roleAssignments/write permissions on the resource group that contains your cluster.

To resolve this issue, either request the required permissions or make the following adjustments to your deployment steps:

  • If deploying with an Azure Resource Manager template, set the deployResourceSyncRules parameter to false.
  • If deploying with the Azure CLI, include the --disable-rsync-rules flag with the az iot ops init command.

Data Processor pipeline deployment status is failed

Your Data Processor pipeline deployment status is showing as Failed.

Find pipeline error codes

To find the pipeline error codes, use the following commands.

To list the Data Processor pipeline deployments, run the following command:

kubectl get pipelines -A

The output from the pervious command looks like the following example:

NAMESPACE                NAME                           AGE
azure-iot-operations     passthrough-data-pipeline      2d20h
azure-iot-operations     reference-data-pipeline        2d20h
azure-iot-operations     contextualized-data-pipeline   2d20h

To view detailed information for a pipeline, run the following command:

kubectl describe pipelines passthrough-data-pipeline -n azure-iot-operations

The output from the previous command looks like the following example:

...
Status:
  Provisioning Status:
    Error
      Code:  <ErrorCode>
      Message: <ErrorMessage>
    Status:        Failed
Events:            <none>

Data is corrupted in the Microsoft Fabric lakehouse table

If data is corrupted in the Microsoft Fabric lakehouse table that your Data Processor pipeline is writing to, make sure that no other processes are writing to the table. If you write to the Microsoft Fabric lakehouse table from multiple sources, you might see corrupted data in the table.

Deployment issues with Data Processor

If you see deployment errors with Data Processor pods, make sure that when you created your Azure Key Vault you chose Vault access policy as the Permission model.

Data Processor pipeline edits aren't applied to messages

If edits you make to a pipeline aren't applied to messages, run the following commands to propagate the changes:

kubectl rollout restart deployment aio-dp-operator -n azure-iot-operations 

kubectl rollout restart statefulset aio-dp-runner-worker -n azure-iot-operations 

kubectl rollout restart statefulset aio-dp-reader-worker -n azure-iot-operations

Data Processor pipeline processing pauses unexpectedly

It's possible a momentary loss of communication with IoT MQ broker pods can pause the processing of data pipelines. You might also see errors such as service account token expired. If you notice this happening, run the following commands:

kubectl rollout restart statefulset aio-dp-runner-worker -n azure-iot-operations
kubectl rollout restart statefulset aio-dp-reader-worker -n azure-iot-operations

Data Processor extension fails to uninstall

If the data processor extension fails to uninstall, run the following commands and try the uninstall operation again:

kubectl delete pod  aio-dp-reader-worker-0 --grace-period=0 --force -n azure-iot-operations
kubectl delete pod  aio-dp-runner-worker-0 --grace-period=0 --force -n azure-iot-operations

Troubleshoot Layered Network Management Preview

The troubleshooting guidance in this section is specific to Azure IoT Operations when using Azure IoT Layered Network Management Preview component. For more information, see How does Azure IoT Operations Preview work in layered network?.

Can't install Layered Network Management Preview on the parent level

Layered Network Management operator install fails or you can't apply the custom resource for a Layered Network Management instance.

  1. Verify the regions are supported for public preview. Public preview supports eight regions. For more information, see Quickstart: Deploy Azure IoT Operations Preview.
  2. If there are any other errors in installing Layered Network Management Arc extensions, follow the guidance included with the error. Try uninstalling and installing the extension.
  3. Verify the Layered Network Management operator is in the Running and Ready state.
  4. If applying the custom resource kubectl apply -f cr.yaml fails, the output of this command lists the reason for error. For example, CRD version mismatch or wrong entry in CRD.

Can't Arc-enable the cluster through the parent level Layered Network Management Preview

If you repeatedly remove and onboard a cluster with the same machine, you might get an error while Arc-enabling the cluster on nested layers. For example, the error message might look like:

Error: We found an issue with outbound network connectivity from the cluster to the endpoints required for onboarding.
Please ensure to meet the following network requirements 'https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#meet-network-requirements'
If your cluster is behind an outbound proxy server, please ensure that you have passed proxy parameters during the onboarding of your cluster.
  1. Run the following command:

    sudo systemctl restart systemd-networkd
    
  2. Reboot the host machine.

Other types of Arc-enablement failures

  1. Add the --debug parameter when running the connectedk8s command.
  2. Capture and investigate a network packet trace. For more information, see capture Layered Network Management packet trace.

Can't install IoT Operations on the isolated cluster

You can't install IoT Operations components on nested layers. For example, Layered Network Management on level 4 is running but can't install IoT Operations on level 3.

  1. Verify the nodes can access the Layered Network Management service running on parent level. For example, run ping <IP-ADDRESS-L4-LNM> from the node.

  2. Verify the DNS queries are being resolved to the Layered Network Management service running on the parent level using the following commands:

    nslookup management.azure.com
    

    DNS should respond with the IP address of the Layered Network Management service.

  3. If the domain is being resolved correctly, verify the domain is added to the allowlist. For more information, see Check the allowlist of Layered Network Management.

  4. Capture and investigate a network packet trace. For more information, see capture Layered Network Management packet trace.

A pod fails when installing IoT Operations on an isolated cluster

When installing the IoT Operations components to a cluster, the installation starts and proceeds. However, initialization of one or few of the components (pods) fails.

  1. Identify the failed pod

    kubectl get pods -n azure-iot-operations
    
  2. Get details about the pod:

    kubectl describe pod [POD NAME] -n azure-iot-operations
    
  3. Check the container image related information. If the image download fails, check if the domain name of download path is on the allowlist. For example:

    Warning  Failed  3m14s  kubelet  Failed to pull image "…
    

Check the allowlist of Layered Network Management Preview

Layered Network Management blocks traffic if the destination domain isn't on the allowlist.

  1. Run the following command to list the config maps.
    kubectl get cm -n azure-iot-operations
    
  2. The output should look like the following example:
    NAME                           DATA   AGE
    aio-lnm-level4-config          1      50s
    aio-lnm-level4-client-config   1      50s
    
  3. The xxx-client-config contains the allowlist. Run:
    kubectl get cm aio-lnm-level4-client-config -o yaml
    
  4. All the allowed domains are listed in the output.

Capture Layered Network Management Preview packet trace

In some cases, you might suspect that Layered Network Management instance at the parent level isn't forwarding network traffic to a particular endpoint. Connection to a required endpoint is causing an issue for the service running on your node. It's possible that the service you enabled is trying to connect to a new endpoint after an update. Or you're trying to install a new Arc extension or service that requires connection to endpoints that aren't on the default allowlist. Usually there would be information in the error message to notify the connection failure. However, if there's no clear information about the missing endpoint, you can capture the network traffic on the child node for detailed debugging.

Windows host

  1. Install Wireshark network traffic analyzer on the host.
  2. Run Wireshark and start capturing.
  3. Reproduce the installation or connection failure.
  4. Stop capturing.

Linux host

  1. Run the following command to start capturing:

    sudo tcpdump -W 5 -C 10 -i any -w AIO-deploy -Z root
    
  2. Reproduce the installation or connection failure.

  3. Stop capturing.

Analyze the packet trace

Use Wireshark to open the trace file. Look for connection failures or nonresponded connections.

  1. Filter the packets with the ip.addr == [IP address] parameter. Input the IP address of your custom DNS service address.
  2. Review the DNS query and response, check if there's a domain name that isn't on the allowlist of Layered Network Management.