What’s new for C++ AMP in Visual Studio 2013

2013-06-28

Since the first release of C++ AMP in Visual Studio 2012 nearly 8 months ago, we have been working hard to bring you the next set of C++ AMP features. BUILD 2013 day 2 keynote demo provided a snapshot of C++ AMP in Visual Studio 2013. In this post, we will delve into the C++ AMP features available in Visual Studio 2013 Preview.

Support for shared CPU\GPU memory

The CPU\GPU data transfer efficiency on accelerators that share physical memory with CPU is now significantly enhanced due to elimination of redundant copying of data between GPU and CPU memory. Depending upon how the code was written, C++ AMP application that run on integrated GPU and WARP accelerators should see no (or significantly reduced) time spent on copying data. This feature is available only on Windows 8.1 and is turned on by default for WARP and some integrated GPUs. Additionally, developers can also opt into the feature programmatically through a set of APIs.

Enhanced support for textures

In Visual Studio 2013, we added a bunch of features to enhance support for textures. The added features include

Access to hardware texture sampling capabilities
Support for staging textures
Texture_view redesigned (to be more consistent with array_view design)
A more complete and performant set of texture copy APIs including section copy
Better interop support for textures including a much bigger set of DXGI formats
Support for mipmap

Improved C++ AMP debugging experience

The debugging experience for C++ AMP code has been improved in multiple fronts. We had previously announced a series of improvements including

Availability of C++ AMP GPU debugging on Windows 7 & Windows Server 2008 R2 platforms and
Availability of remote GPU hardware debugging on Nvidia GPUs.

Apart from these in Visual Studio 2013, we enabled the following set of features

Side-by-side CPU\GPU debugging. Currently mixed mode debugging is available on Windows 8.1 for the WARP accelerator.
Ability to debug using the WARP accelerator instead of single threaded ref accelerator. Using WARP for debugging provides a much faster debugging experience.

Faster C++ AMP runtime

We have worked to improve the performance of the C++ AMP runtime in order to provide even faster application performance. The work includes

Reduced parallel_for_each launch overheads
Optimized texture copy performance
Optimized performance of copying small data sizes between the CPU and accelerator

Array_view API improvements

In Visual Studio 2013, the following set of improvements have been made to the array_view abstraction:

Ability to create array_view without a data source
Ability to synchronize to a specific accelerator.
Performant array_view indexing operators on CPU

Additional changes

Apart from the changes listed above, we also took time to refine other parts of C++ AMP too. These changes include:

New APIs to enable clean AMP runtime shutdown
Improved the accuracy and helpfulness of C++ AMP runtime exception messages
Improved the accuracy of ETW events for better profiling experience
Ability to lock/unlock accelerator_views to allow safe access to shared resources between C++ AMP and Direct3D APIs.

We are excited to bring the next set of features in C++ AMP and in the coming weeks, we will be discussing these new features in depth. We hope you will take the time to download Visual Studio 2013 Preview and send us your feedback, comments and questions – below or in our MSDN forum.

Comments

Anonymous
June 28, 2013
Hi Boby. Great to see the C++ AMP improvements. The shared memory support is particularly exciting. Wondering if that is enabled by using something new from D3D 11.2? Secondly, is there info on supported hardware platforms from Intel and AMD for this feature?
Anonymous
June 28, 2013
This all sounds good, but is there any movement on a Mac implementation of C++/AMP? We have a portable C++ app that runs on Mac and Windows, and would like a GPU acceleration solution, and C++/AMP might be the choice if it gets implemented on the Mac.
Anonymous
June 29, 2013
To answer my own question, the shared memory support is likely using the ability to map GPU default buffers without going through a staging buffer, which is being introduced in D3D 11.2. I had not realized this has been introduced when I first read the docs :)
Anonymous
July 01, 2013
Yes, shared memory support in C++ AMP uses the ability to map D3D11_USAGE_DEFAULT buffers for CPU access which is being introduced with WDDM 1.3 (supported on Windows 8.1 and later OS versions only). This facility is available for D3D11 feature level 10+.
Anonymous
July 01, 2013
@small_mountain_0705: there are couple of proof of concept implementation of C++ AMP on top of OpenCL running in the wild, but nothing that you are depend on for production. That is the latest information we have. We are continuing to work with our partners and encouraging them to release support for C++ AMP on other platforms and will release an updated open specification to reflect these changes.
Anonymous
July 02, 2013
Is shared memory supported on Windows Server 8.1 aka blue ?
Anonymous
July 02, 2013
Yes, shared memory is supported on Windows 8.1 Server.
Anonymous
July 08, 2013
hey guys, great work on bringing parallel programming and gpgpu to the masses. Much appreciated. Any News on the ppl?
Anonymous
July 08, 2013
Hi heff, The biggest news in the PPL (specifically, PPL tasks) in this release were support for cross-platform and user-defined schedulers. One of the biggest beneficiaries of this is the Casablanca project, which is built by the same team. We also fixed a number of bugs – you’ll care about it mostly if you’re writing Windows Store apps. Not everything we wanted to include in the product made the cut, but the good news is that we’re going to release a set of additions to the PPL called the PPL Power Pack that will contain some oft-requested features. We’re putting some finishing touches on it now, and will announce it on this blog soon, so stay tuned. Artur
Anonymous
July 10, 2013
Hi, thank you for the answer ... that's great news. You guys are doing a great job.
Anonymous
July 31, 2013
Hi, Could you give more details on the "Reduced parallel_for_each launch overheads"? How much faster can one expect the new version to be?
Anonymous
August 06, 2013
@ola, For this release, we focused on reducing the launch overhead related to runtime. Our micro benchmarks (that measure those specific parts of the runtime overheads) showed significant (25 to 50%) improvement over previous release. However, due to additional elements such as DirectX and IHV driver overheads, end to end scenarios will not see the perf improvement we saw in our micro benchmarks. The performance improvement in end to end scenario depends on the no of p_f_e invocations made. As you batch more p_f_e invocations together, you should see noticeable performance improvements due to runtime perf improvement we made.
Anonymous
August 08, 2013
The comment has been removed
Anonymous
August 17, 2013
Hi there, is there any plans to support 64 bits integers or other types like char or short ?
Anonymous
August 19, 2013
@AMP, As we plan for the next version, we would be considering support for types like char, short etc. Could you explain your scenario and how support for other types helps you...this would aid in prioritization of the features for the next release. Thanks. you can email your scenario to bobyg AT Microsoft Dot com.
Anonymous
August 19, 2013
@Mahmoud Can you contact me via email (bobyg AT Microsoft dot com), I would need more details regarding your request and we can have that conversation offline. Thanks
Anonymous
September 18, 2013
Are you requiring VS 2013 for the new version of AMP? If so why do you continue to put roadblock in front of your technology?
Anonymous
September 18, 2013
@James, The new version of C++ AMP would need 2013 version of Visual C++ compiler and runtime. Additionally shared memory is Windows 8.1 and windows server 2012 r2 only feature (since it has OS dependencies). We are interested to hear more about how you are blocked from using newer C++ AMP. Thanks
Anonymous
February 11, 2014
The comment has been removed
Anonymous
February 11, 2014
I guess I rushed into it. Today I took the original MatrixMultiply AMP sample code and ran it after building with 2012 and 2013, and it ran with exactly the same speed. Now I need to go back to my own code and dig deeper to understand what in it makes it slower with 2013... Oh, what fun! :-)
Anonymous
February 12, 2014
Thanks Victor for clarifying. It saved us some of our time :). Your previous comment was indeed surprising and we were about to double check that at our end. Please feel free to post any further question you might have while trying to understand AMP behavior in your implementation of matrix multiplication.
Anonymous
May 07, 2014
The comment has been removed

Partager via