Training
Module
Monitor and manage performance and health - Training
Monitor and manage performance and health
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article provides information about context monitoring, which was introduced in Windows 10 (WDDM 2.0).
Context monitoring allows for flexible synchronization between GPU engines, or across CPU cores and GPU engines. A monitored fence object is an advanced form of fence synchronization that allows either a CPU core or a GPU engine to signal or wait on a particular fence object.
The Direct3D runtime creates a monitored fence object by calling the user-mode driver's (UMD) pfnCreateSynchronizationObject2Cb callback with a D3DDDICB_CREATESYNCHRONIZATIONOBJECT2 structure. This structure's Info member is a D3DDDI_SYNCHRONIZATIONOBJECTINFO2 structure that describes the synchronization object to create. The runtime sets Info.Type to D3DDDI_MONITORED_FENCE to indicate that the Info.MonitoredFence structure is to be used during creation.
The created monitored fence object has the following attributes:
Upon creation, a monitored fence object is returned with the following information:
Item | Description |
---|---|
hSyncObject | Handle to the synchronization object. This handle is used in subsequent calls to Dxgkrnl. |
FenceValueCPUVirtualAddress | Read-only mapping of the fence value (64 bits) for the CPU. This address is mapped WB (cacheable) from the point of view of the CPU on platforms supporting I/O coherency, UC (uncached) on other platforms. Allows the CPU to keep track of the fence progress by just reading this memory location. The CPU isn't allowed to write to this memory location. To signal the fence, the CPU is required to call the SignalSynchronizationObjectFromCpuCb. Adapters that support IoMmu should use this address for GPU access. The address is mapped as read-write in this case. |
FenceValueGPUVirtualAddress | Read/write mapping of the fence value (64 bits) for the GPU. This address is mapped as requiring I/O coherency on platforms supporting it. To signal the fence, the GPU is allowed to write directly to this GPU virtual address. IoMmu GPUs shouldn't use this address. |
The fence value is a 64-bit value with their respective virtual addresses aligned on a 64-bit boundary. GPUs should declare whether they're capable of atomically updating 64-bit values as visible by the CPU via the added DXGK_VIDSCHCAPS::No64BitAtomics flag. If a GPU is capable of only updating 32-bit values atomically, the OS handles the fence wraparound case automatically. However it places a restriction that outstanding wait and signal fence values can't be more than UINT_MAX/2 away from the last signaled fence value.
If a GPU engine isn't capable of writing to a monitored fence using its virtual address, the UMD uses the SignalSynchronizationObjectFromGpuCb callback to queue a software signal packet to the GPU context.
To signal the fence from the GPU, the UMD inserts a fence write command in a context command stream directly without going through kernel mode. The mechanism by which the kernel monitors fence progress varies depending on whether a particular GPU engine supports the basic or advanced implementation of the monitored fence.
When a command buffer completes execution on the GPU, Dxgkrnl:
To wait on a monitored fence on a GPU engine, the UMD first needs to flush its pending command buffer then call WaitForSynchronizationObjectFromGpuCb specifying the fence object (hSyncObject) and the fence value being waited on. Dxgkrnl queues the dependency to its internal database, then returns immediately to the UMD so that it can continue to queue work behind the wait operation. Command buffers submitted after the wait operation aren't scheduled for execution until the wait operation is satisfied.
The SignalSynchronizationObjectFromCpuCb callback was added to allow the CPU to signal a monitored fence object. When the CPU signals a monitored fence object, Dxgkrnl updates the fence memory location with the signaled value. This value becomes immediately visible to any user-mode reader and immediately unwaits any satisfied waits.
A WaitForSynchronizationObjectFromCpuCb callback was added to allow the CPU to wait on a monitored fence object. Two forms of wait operations are available:
Training
Module
Monitor and manage performance and health - Training
Monitor and manage performance and health
Documentation
Native GPU Fence Object - Windows drivers
Describes the GPU fence synchronization object that can be used for true GPU-to-GPU synchronization in GPU hardware scheduling stage 2.
User-Mode Work Submission - Windows drivers
Describes user-mode work submission in the Windows OS, which enables applications to submit work to the GPU directly from user mode with very low latency.
D3Dkmthk.h header - Windows drivers
Learn more about: D3Dkmthk.h header