Understanding all_resources_bound in HLSL

The latest version of fxc includes a new switch, /all_resources_bound, documented here. The API equivalent is the D3DCOMPILE_ALL_RESOURCES_BOUND constant, described here. But what does it do?

Direct3D 12 is much more flexible in how resource descriptors are laid out for the shader to consume. It is possible to have a resource be invalid (“unbound”) if it’s never accessed. However, the HLSL should then be inhibited from making optimizations that may run into problems when this is the case.

Let's work through an example, and say I have this pixel shader.

 Texture2D<float4> t0;
uint count;

float4 main() : SV_Target {
  float4 result = 0;
  for (uint i = 0; i < count; ++i) {
    result += t0[float2(0,0)];
  }
  return result;
}

The expression t0[float2(0,0)] is reading from a texture resource. This is generally an expensive operation, so a common optimization is to do the read once before entering loop, then use that read value multiple times inside the loop.

For shader models 5.1, the compiler is by default conservative and will not move the texture load outside of the loop. For example, the app might never set the texture to a valid descriptor when the count will be zero; reading from the unbound texture could cause problems that the shader author was guarding against. The read will only occur inside the loop.

 mov r0.xyzw, l(0,0,0,0)
mov r1.x, l(0)
loop
  uge r1.y, r1.x, CB0[0][0].x
  breakc_nz r1.y
  ld r2.xyzw, l(0, 0, 0, 0), T0[0].xyzw
  add r0.xyzw, r0.xyzw, r2.xyzw
  iadd r1.x, r1.x, l(1)
endloop
mov o0.xyzw, r0.xyzw

But if the application will always run with all resources bound, which is a very common case depending on your resource management strategy, then the /all_resources_bound switch will allow these kind of optimizations to take place.

 ld r0.xyzw, l(0, 0, 0, 0), T0[0].xyzw
mov r1.xyzw, l(0,0,0,0)
mov r2.x, l(0)
loop
  uge r2.y, r2.x, CB0[0][0].x
  breakc_nz r2.y
  add r1.xyzw, r0.xyzw, r1.xyzw
  iadd r2.x, r2.x, l(1)
endloop
mov o0.xyzw, r1.xyzw

Enjoy!

Note: a more sophisticated transformation for potentially unbound resources would guard a single read with the condition covered in the for loop, then reuse the value across the loop iterations, but fxc isn’t doing that now; the hardware driver can still take advantage of this opportunity. This can also be coded in this way and the compiler will not alter this shape.