Detecting Write after Read Hazards with the GPU debugger
This post assumes you have read the introduction and prerequisite: Visual Studio Race Detection for C++ AMP. I will also assume that you have enabled all the race detection options of the Exceptions dialog, as described in that prerequisite blog post. In this post, I am going to share an example that demonstrates detection of write after read hazards between threads belonging to the same tile.
To show an example of a write after read hazard, we’re going to use the Matrix Multiplication sample. This time let’s comment out the second tidx.barrier.wait() statement, on line 141, to see what happens in its absence:
126: for (int i = 0; i < N; i += tile_size)
127: {
128: tile_static _type localB[tile_size][tile_size];
129: tile_static _type localA[tile_size][tile_size];
130:
131: localA[localIdx[0]][localIdx[1]] = av_a(globalIdx[0], i + localIdx[1]);
132: localB[localIdx[0]][localIdx[1]] = av_b(i + localIdx[0], globalIdx[1]);
133:
134: tidx.barrier.wait();
135:
136: for (unsigned k = 0; k < tile_size; k++)
137: {
138: temp_c += localA[localIdx[0]][k] * localB[k][localIdx[1]];
139: }
140:
141: // tidx.barrier.wait();
142: }
Running this program with all race detection options turned on, we encounter the write-after-read hazard warning dialog notification triggered by thread [0,0][0,0]:
This particular race hazard can be a bit difficult to see as the previous access is reported to be at a later line (line 138) than the current instruction (line 131):
As you might have guessed, the conflicting instruction is executed at an earlier iteration of the loop. The problem here is thread [0,0][0,0] (current thread) is retrieving the next set of data and overwriting the previous data where thread [0,0][0,3] (conflicting thread) might not yet be done with processing the previous set of data. The tile_barrier that was commented out ensures that all threads are done with an iteration of the loop before moving on to the next iteration.
If your code is in a loop, remember that it might switch between writing & reading memory locations between iterations of the loop. You can unroll the loop to make it easier to spot such patterns. In our case above it switches from reading (line 138) to writing (line 131).