Matrix Transpose using C++ AMP

Matrix transpose is a common operation on matrix. Here is the sample implemented using C++ AMP.

main – Entry point

In main, we call a driver function for each implementations of matrix transpose. 

test_transpose_func

This is the driver function which generates input data; invoke user function which implements C++ AMP kernel and eventually verify the results computed using C++ AMP. The input and output data are encapsulate in an array_view and then passed to user function to compute results.

transpose_simple

This is a user function specified in main and called from the driver function. This function implements a C++ AMP kernel which transposes a matrix. The parallel_for_each call will spawn one GPU thread per element of the matrix. Each thread will read an element from input array_view indexed by thread index and update output array_view with transposed thread index as destination index. 

transpose_tiled_even

This is a user function specified in main and called from the driver function. This function implements a C++ AMP kernel to transpose the matrix using tiling. This function to transpose correctly must have the tiling dimension and the matrix dimension divisible. There is one GPU thread per element of the matrix but tiled in 2 dimension of size “tile_size * tile_size”. Each thread will read an element from input array_view, wait on a barrier for all the threads in the tile to complete the read and then update the output array_view. The source or input index matches with tiled_index::global index. The destination or output index is calculated as a function of tiled_index::tile_orgin and tiled_index::local index like

index<2> idxdst(transpose(tidx.tile_origin) + tidx.local);

transpose_tiled_pad

This is a user function specified in main and called from the driver function. For details of the purpose and design of this method, please see the blog post on tiled_extent::pad.

transpose_tiled_truncate_option_a

This is a user function specified in main and called from the driver function. For details of the purpose and design of this method, please see the blog post on tiled_extent::truncate.

transpose_tiled_truncate_option_b

This is a user function specified in main and called from the driver function. For details of the purpose and design of this method, please see the blog post on tiled_extent::truncate.

 

Download the sample

Please download the attached sample project of the matrix transpose  that I mentioned here and run it on your hardware, and try to understand what the code does and to learn from it. You will need, as always, Visual Studio 11.


transpose.zip