How to Design Azure-based Observability for Edge AI Systems (Docker on Ubuntu/WSL)

Sudhakar P 145 Reputation points
2025-04-22T11:04:41.1033333+00:00

We are building an observability system in Azure for edge devices deployed at 20–30 manufacturing locations across India. Each edge device runs Docker containers (Node.js, MongoDB, Triton server) on Ubuntu or Windows+WSL with GPU support. The devices are connected via dongles or customer internet.

We'd like to monitor:

System health (uptime, CPU/GPU/memory/disk usage)

Network & internet connectivity

Docker/container health & restarts

USB connection status

Logs from our C++ app (MVC), PLC, and Node.js services

Custom metrics from CSV reports

Dashboards per device & per customer

What’s the best approach to implement this with Azure Monitor, Log Analytics, or IoT tools?

Azure Internet of Things
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 34,741 Reputation points MVP Volunteer Moderator
    2025-04-22T14:33:59.26+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    try the following approach:

    1. Edge Data Collection: Use Telegraf for system/GPU metrics. Use Fluent Bit for logs (C++, Node.js, PLC).
        Use custom scripts for Docker health, USB status, network checks, and parsing CSVs.
      
        1. Data Ingestion:
      
      • Send metrics/logs to azure monitor via log analytics. Use HTTP Data Collector API for custom metrics.
    2. Visualization:
      • Use azure workbooks for per-device/customer dashboards.
      • Optionally integrate with PBI.
    3. Optional:
      • Use azure IoT hub if you need secure device management and bidirectional communication.
    4. This setup is lightweight, scalable, and Azure native.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.