Compute Shaders on Downlevel Hardware
Direct3D 11 provides the ability to use compute shaders that operate on most Direct3D 10.x hardware, with some limitations to operation. The compute shader technology is also known as the DirectCompute technology. This topic discusses how to make use of compute shaders in a Direct3D 11 app on Direct3D 10 hardware.
Support for compute shaders on downlevel hardware is only for devices compatible with Direct3D 10.x. Compute shaders cannot be used on Direct3D 9.x hardware.
To check if Direct3D 10.x hardware supports compute shaders, call ID3D11Device::CheckFeatureSupport. In the CheckFeatureSupport call, pass the D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS value to the Feature parameter, pass a pointer to the D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS structure to the pFeatureSupportData parameter, and pass the size of the D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS structure to the FeatureSupportDataSize parameter. CheckFeatureSupport returns TRUE in the ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x member of D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS if the Direct3D 10.x hardware supports compute shaders.
The 10Level9 Reference section lists the differences between how various ID3D11Device and ID3D11DeviceContext methods behave at various 10Level9 feature levels.
- Unordered Access Views (UAVs)
- Shader Resource Views (SRVs)
- Thread Groups
- D3DCompile with D3DCOMPILE_SKIP_OPTIMIZATION
- Related topics
Unordered Access Views (UAVs)
Raw (RWByteAddressBuffer) and Structured (RWStructuredBuffer) Unordered Access Views are supported on downlevel hardware, with the following limitations:
- Only a single UAV may be bound to a pipeline at a time through ID3D11DeviceContext::CSSetUnorderedAccessViews.
- The base offset for a Raw UAV must be aligned on a 256-byte boundary (instead of 16-byte alignment required for Direct3D 11 hardware).
Typed UAVs are not supported on downlevel hardware. This includes Texture1D, Texture2D, and Texture3D UAVs.
Pixel Shaders on downlevel hardware do not support unordered access.
Shader Resource Views (SRVs)
Raw and Structured Buffers as Shader Resource Views are supported on downlevel hardware for read-only access, as they are on Direct3D 11 hardware. These resource types are supported for Vertex Shaders, Geometry Shaders, Pixel Shaders as well as Compute Shaders.
A compute shader can execute on many threads in parallel, within a thread group.
Thread groups are supported on downlevel hardware, with the following limitations:
Thread Group Dimensions
Thread groups defined for downlevel hardware are limited to X and Y dimensions of 768. This is less than the maximum values of 1024 for Direct3D 11 hardware. The maximum Z dimension of 64 is unchanged.
The total number of threads in the group (X × Y × Z) is limited to 768. This is less than the limit of 1024 for Direct3D 11 hardware.
If these numbers are exceeded, shader compilation will fail.
Two-Dimensional Thread Indices
A particular thread within a thread group is indexed using a 3D vector given by (x,y,z).
For compute shaders operating on downlevel hardware, thread groups only support two dimensions. This means that the Z value in the 3D vector must always be 1.
This limitation specifically applies to the following:
- ID3D11DeviceContext::Dispatch— The ThreadGroupCountZ argument must be 1.
- ID3D11DeviceContext::DispatchIndirect— This function is not supported on downlevel hardware.
- numthreads— The Z value must be 1.
Thread Group Shared Memory (TGSM)
Thread Group Shared Memory is limited to 16Kb on downlevel hardware. This is less than the 32Kb that is available to Direct3D 11 hardware.
A Compute Shader thread may only write to its own region of TGSM. This write-only region has a maximum size of 256 bytes or less, with the maximum decreasing as the number of threads declared for the group increases.
The following table defines the per-thread maximum size of a TGSM region for the number of threads in the group:
|Number of Threads in Group||Maximum TGSM Size Per Thread|
A Compute Shader thread may read the TGSM from any location.
D3DCompile with D3DCOMPILE_SKIP_OPTIMIZATION
D3DCompile returns E_NOTIMPL when you pass cs_4_0 as the shader target along with the D3DCOMPILE_SKIP_OPTIMIZATION compile option. The cs_5_0 shader target works with D3DCOMPILE_SKIP_OPTIMIZATION.