C++ AMP in a nutshell
With Visual Studio 2012, you are able to get your hands on a new technology that enables you to tap into the power of heterogeneous hardware and specifically take advantage of accelerators such as the GPU for compute purposes: C++ AMP.
So you'll create an empty C++ project, add a new code file, stick a #include <amp.h>
at the top, then add a using namespace concurrency;
and then what? I suggest you try typing in our example C++ AMP matrix multiplication code, and trying it out on your DirectX 11 hardware, or download one of our many C++ AMP samples. including the simplistic "Hello World" code.
Then, you can play around with all the new constructs we've added, following the description of each from the following posts:
- accelerator
- accelerator_view
- index<N>
- extent<N>
- array<T,N>
- array_view<T,N>
- parallel_for_each
- restrict
- tiled parallel_for_each including tiled_index and tiled_grid
- tile_static
- tile_barrier
Beyond the core API above, there is even more for you to explore
- graphics namespace (textures and short vector types)
- Interop with DirectX
- global copy operations and continuations
- Atomic operation library
- Math library
- Exceptions
- Three debug functions: printf, errorf, abort
- Twenty direct3d functions
If the above is not enough reading, you can read the 130 pages of the C++ AMP open specification.
Beyond the API and language, there is fantastic Visual Studio integration (intellisense, code navigation, project and build system, etc) and in particular I encourage you to explore the support for
- debugging
- profiling (and in general the C++ AMP performance guidance)
You'll no doubt have questions and feedback, and we truly look forward to taking those in our dedicated Native Parallelism forum. Enjoy!
Comments
Anonymous
March 24, 2013
Hi Daniel, Thanks for sharing these. I am currently changing a CUDA style code to C++ AMP style. I have wonders on how to replace several CUDA keywords. Can you give advice on what C++ AMP can replce the following CUDA codes?- align(16)
- Cuda::HostMemoryReference1D<int>
- DeviceMemoryLinear2D<int>
- DeviceMemoryPitched3D<int>
- cuda_safe_call() related 5.1 cuda_safe_call( cudaGetLastError() ); 5.2 cuda_safe_call( cudaUnbindTexture(...) ); 5.3 cuda_safe_call( cudaDeviceSynchronize() );
- cudaBindTexture() Thanks in advance!
Anonymous
March 25, 2013
Looking forward to the answers to the previous questions.Anonymous
March 26, 2013
- align(16) >> The equivalent of this capability in the MS VC++ compiler is __declspec(align(#)) which allows you to control the alignment of user-defined data.
- Cuda::HostMemoryReference1D<int> >> The C++ AMP concurrency::array_view type enables multidimensional views over existing CPU memory.
- DeviceMemoryLinear2D<int> >> The C++ AMP concurrency::array and concurrency::array_view types are equivalent abstractions of multidimensional data containers.
- DeviceMemoryPitched3D<int> >> There is not direct equivalent of this in C++ AMP. However, if you want to use multidimensional data with specific pitch, you can achieve the same through using the "section" capability of array and array_view types in C++ AMP. Note that when doing this, you would be responsible for defining the pitch unlike the CUDA pitched allocations where the CUDA runtime determines the pitch.
- cuda_safe_call() related 5.1 cuda_safe_call( cudaGetLastError() ); >> AFAIK cuda_safe_call is just a macro for better error diagnostics for CUDA API calls in debug mode. C++ AMP uses exceptions for runtime errors and when compiling C++ AMP programs in debug mode, you would automatically get detailed debug diagnostics to help you better understand the error. 5.2 cuda_safe_call( cudaUnbindTexture(...) );
- cudaBindTexture() >> Please refer to our blog post on textures to learn about texture capabilities in C++ AMP. 5.3 cuda_safe_call( cudaDeviceSynchronize() ); >> accelerator_view::wait is the equivalent C++ AMP API. Please feel free to ask any further questions on our MSDN forum.
- Anonymous
March 26, 2013
Hi Amit, many thanks for your detailed answer. :)