How to read and output the NPU utilization

John Huang 0 Reputation points
2024-06-18T05:32:45.61+00:00

Hi All

How to use the following MS learning to read and output the NPU utilization. Thank you.

https://learn.microsoft.com/en-us/windows/win32/api/activitycoordinatortypes/ne-activitycoordinatortypes-activity_coordinator_resource

User's image

User's image

Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,492 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,561 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Tong Xu - MSFT 2,116 Reputation points Microsoft Vendor
    2024-06-18T10:08:26.4133333+00:00

    Hello,

    Welcome to Microsoft Q&A!
    It was updated in 6/12 soon. We still does not test it. So I just can show the old methods.
    I've already consulted with engineers of Task Manager and we can't release the full source code. I'm apologized.

    First of all, Check whether the device is an NPU or GPU.
    1.You can use DXCore adapter attribute GUIDs.

    if (myDxCoreAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE) &&(!myDxCoreAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS))) 
    {     _myDxAdapterclass =  NPU;  }
    

    2.Or Checking for Activity Coordinator resource support

    Secondly, you need to obtain Device Information Sets by SetupDiGetClassDevs.

    GUID COMPUTE_ACCELERATOR_CLASS_GUID = { 0xf01a9d53, 0x3ff6, 0x48d2,{ 0x9f, 0x97, 0xc8, 0xa7, 0x00, 0x4b, 0xe1, 0x0c } }; 
    GUID*  NPUDevClass = &COMPUTE_ACCELERATOR_CLASS_GUID;
    HDEVINFO deviceInfo = SetupDiGetClassDevs(DeviceClass,NULL,NULL,(DIGCF_PROFILE | DIGCF_PRESENT));
    

    You can find the GUID from the back of NPU device.

    Then, after getting the Device Info Handle, use SetupDiGetDevicePropertyW to get the device information you want. The PhyId information can be obtained from the following Device Property Key:

    DEVPROPKEY DEVPKEY_Gpu_PhyId   { 60b193cb-5276-4d0f-96fc-f173abad3ec6, 3 }
    DEVPROPKEY DEVPKEY_GPU_LUID  { 60b193cb-5276-4d0f-96fc-f173abad3ec6, 2}
    

    Finally, we can either use pdh to get the perf counter data or any other ways to get perf counter then using the physical ID to classify the running time data and do the math. Same as GPU's.

    #pragma once
    #include <chrono>
    #include <iostream>
    #include <regex>
    #include <vector>
    #include <pdh.h>
    #include <pdhmsg.h>
    #include <strsafe.h>
    #include <tchar.h>
    #pragma comment(lib, "pdh.lib")
    // https://docs.microsoft.com/en-us/windows/win32/perfctrs/enumerating-process-objects
    std::vector<std::pair<int, int>> GetGPURunningTimeProcess() {
      std::vector<std::pair<int, int>> ret;
      DWORD counterListSize = 0;
      DWORD instanceListSize = 0;
      DWORD dwFlags = 0;
      const auto COUNTER_OBJECT = TEXT("GPU Engine");
      PDH_STATUS status = ERROR_SUCCESS;
      status = PdhEnumObjectItems(nullptr, nullptr, COUNTER_OBJECT, nullptr,
                                  &counterListSize, nullptr, &instanceListSize,
                                  PERF_DETAIL_WIZARD, dwFlags);
      if (status != PDH_MORE_DATA) {
        throw std::runtime_error("failed PdhEnumObjectItems()");
      }
      std::vector<TCHAR> counterList(counterListSize);
      std::vector<TCHAR> instanceList(instanceListSize);
      status = ::PdhEnumObjectItems(
          nullptr, nullptr, COUNTER_OBJECT, counterList.data(), &counterListSize,
          instanceList.data(), &instanceListSize, PERF_DETAIL_WIZARD, dwFlags);
      if (status != ERROR_SUCCESS) {
        throw std::runtime_error("failed PdhEnumObjectItems()");
      }
      for (TCHAR* pTemp = instanceList.data(); *pTemp != 0;
           pTemp += _tcslen(pTemp) + 1) {
        if (::_tcsstr(pTemp, TEXT("engtype_3D")) == NULL) {
          continue;
        }
        TCHAR buffer[1024];
        ::StringCchCopy(buffer, 1024, TEXT("\\GPU Engine("));
        ::StringCchCat(buffer, 1024, pTemp);
        ::StringCchCat(buffer, 1024, TEXT(")\\Running time"));
        HQUERY hQuery = NULL;
        status = ::PdhOpenQuery(NULL, 0, &hQuery);
        if (status != ERROR_SUCCESS) {
          continue;
        }
        HCOUNTER hCounter = NULL;
        status = ::PdhAddCounter(hQuery, buffer, 0, &hCounter);
        if (status != ERROR_SUCCESS) {
          continue;
        }
        status = ::PdhCollectQueryData(hQuery);
        if (status != ERROR_SUCCESS) {
          continue;
        }
        status = ::PdhCollectQueryData(hQuery);
        if (status != ERROR_SUCCESS) {
          continue;
        }
        const DWORD dwFormat = PDH_FMT_LONG;
        PDH_FMT_COUNTERVALUE ItemBuffer;
        status =
            ::PdhGetFormattedCounterValue(hCounter, dwFormat, nullptr, &ItemBuffer);
        if (ERROR_SUCCESS != status) {
          continue;
        }
        if (ItemBuffer.longValue > 0) {
    #ifdef _UNICODE
          std::wregex re(TEXT("pid_(\\d+)"));
          std::wsmatch sm;
          std::wstring str = pTemp;
    #else
          std::regex re(TEXT("pid_(\\d+)"));
          std::smatch sm;
          std::string str = pTemp;
    #endif
          if (std::regex_search(str, sm, re)) {
            int pid = std::stoi(sm[1]);
            ret.push_back({pid, ItemBuffer.longValue});
          }
        }
        ::PdhCloseQuery(hQuery);
      }
      return ret;
    }
    int64_t GetGPURunningTimeTotal() {
      int64_t total = 0;
      std::vector<std::pair<int, int>> list = GetGPURunningTimeProcess();
      for (const std::pair<int, int>& v : list) {
        if (v.second > 0) {
          total += v.second;
        }
      }
      return total;
    }
    double GetGPUUsage() {
      static std::chrono::steady_clock::time_point prev_called =
          std::chrono::steady_clock::now();
      static int64_t prev_running_time = GetGPURunningTimeTotal();
      std::chrono::steady_clock::time_point now = std::chrono::steady_clock::now();
      std::chrono::steady_clock::duration elapsed = now - prev_called;
      int64_t elapsed_sec =
          std::chrono::duration_cast<std::chrono::nanoseconds>(elapsed).count();
      int64_t running_time = GetGPURunningTimeTotal();
      double percentage =
          (double)(running_time - prev_running_time) / elapsed_sec * 100;
      // printf("percent = (%lld - %lld) / %lld * 100 = %f\n", running_time,
      // prev_running_time, elapsed_sec, percentage);
      prev_called = now;
      prev_running_time = running_time;
      if (percentage > 1.0)
        percentage = 1.0;
      else if (percentage < 0.0)
        percentage = 0.0;
      return percentage;
    }
    

    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments