June 2014

Volume 29 Number 6

Windows with C++ : High-Performance Window Layering Using the Windows Composition Engine

Kenny Kerr | June 2014

Kenny KerrI’ve been fascinated by layered windows since I first came across them in Windows XP. The ability to escape the rectangular, or near-rectangular, bounds of a traditional desktop window always seemed so intriguing to me. And then Windows Vista happened. That much-maligned release of Windows hinted at the start of something far more exciting and far more flexible. Windows Vista started something we’ve only begun to appreciate now that Windows 8 is here, but it also marked the slow decline of the layered window.

Windows Vista introduced a service called the Desktop Window Manager. The name was and continues to be misleading. Think of it as the Windows composition engine or compositor. This composition engine completely changed the way application windows are rendered on the desktop. Rather than allowing each window to render directly to the display, or display adapter, every window renders to an off-screen surface or buffer. The system allocates one such surface per top-level window and all GDI, Direct3D and, of course, Direct2D graphics are rendered to these surfaces. These off-screen surfaces are called redirection surfaces because GDI drawing commands and even Direct3D swap chain presentation requests are redirected or copied (within the GPU) to the redirection surface.

At some point, independent of any given window, the composition engine decides it’s time to compose the desktop given the latest batch of changes. This involves composing all of these redirection surfaces together, adding the non-client areas (often called window chrome), perhaps adding some shadows and other effects, and presenting the final result to the display adapter.

This composition process has many wonderful benefits I’ll explain over the next few months as I explore Windows composition in more detail, but it also has one potentially serious restriction in that these redirection surfaces are opaque. Most applications are quite happy with this and it certainly makes a lot of sense from a performance perspective, because alpha blending is expensive. But this leaves layered windows out in the cold.

If I want a layered window, I have to take a performance hit. I describe the specific architectural limitations in my column, “Layered Windows with Direct2D” (msdn.microsoft.com/magazine/ee819134). To summarize, layered windows are processed by the CPU, primarily to support hit testing of alpha-blended pixels. This means the CPU needs a copy of the pixels that make up the layered window’s surface area. Either I render on the CPU, which tends to be a lot slower than GPU rendering, or I render on the GPU, in which case I must pay the bus bandwidth tax because everything I render must be copied from video memory to system memory. In the aforementioned column, I also show how I might make the most of Direct2D to squeeze as much performance as possible out of the system because only Direct2D lets me make the choice between CPU and GPU rendering. The kicker is that even though the layered window necessarily resides in system memory, the composition engine immediately copies it to video memory such that the actual composition of the layered window is still hardware-accelerated.

While I can’t offer you any hope of traditional layered windows returning to prominence, I do have some good news. Traditional layered windows offer two specific features of interest. The first is per-pixel alpha blending. Whatever I render to the layered window will be alpha blended with the desktop and with whatever is behind the window at any given moment. The second is the ability for User32 to hit test layered windows based on pixel alpha values, allowing mouse messages to fall through if the pixel at a particular point is transparent. As of Windows 8 and 8.1, User32 hasn’t changed significantly, but what has changed is the ability to support per-pixel alpha blending purely on the GPU and without the cost of transmitting the window surface to system memory. This means I can now produce the effect of a layered window without compromising performance, provided I don’t need per-pixel hit testing. The whole window will hit test uniformly. Setting aside hit testing, this excites me because it’s something the system can obviously do, but it’s just never been possible for applications to tap into this power. If this sounds intriguing to you, then read on and I’ll show you how it’s done.

The key to making this happen involves embracing the Windows composition engine. The composition engine first surfaced in Windows Vista as the Desktop Window Manager with its limited API and its popular translucent Aero glass effect. Then Windows 8 came along and introduced the DirectComposition API. This is just a more extensive API for the same composition engine. With the Windows 8 release, Microsoft finally allowed third-party developers to tap into the power of this composition engine that’s been around for such a long time. And, of course, you’ll need to embrace a Direct3D-powered graphics API such as Direct2D. But first you need to deal with that opaque redirection surface.

As I mentioned earlier, the system allocates one redirection surface for each top-level window. As of Windows 8, you can now create a top-level window and request that it be created without a redirection surface. Strictly speaking, this has nothing to do with layered windows, so don’t use the WS_EX_LAYERED extended window style. (Support for layered windows actually gained a minor improvement in Windows 8, but I’ll take a closer look at that in an upcoming column.) Instead, you need to use the WS_EX_NOREDIRECTIONBITMAP extended window style that tells the composition engine not to allocate a redirection surface for the window. I’ll start with a simple and traditional desktop window. Figure 1 provides an example of filling in a WNDCLASS structure, registering the window class, creating the window and pumping window messages. Nothing new here, but these fundamentals continue to be essential. The window variable goes unused, but you’ll need that in a moment. You can copy this into a Visual C++ project inside Visual Studio, or just compile it from the command prompt as follows:

cl /W4 /nologo Sample.cpp

Figure 1 Creating a Traditional Window

#ifndef UNICODE
#define UNICODE
#endif
#include <windows.h>
#pragma comment(lib, "user32.lib")
int __stdcall wWinMain(HINSTANCE module, HINSTANCE, PWSTR, int)
{
  WNDCLASS wc = {};
  wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
  wc.hInstance     = module;
  wc.lpszClassName = L"window";
  wc.style         = CS_HREDRAW | CS_VREDRAW;
  wc.lpfnWndProc =
  [] (HWND window, UINT message, WPARAM wparam, 
  LPARAM lparam) -> LRESULT
  {
    if (WM_DESTROY == message)
    {
      PostQuitMessage(0);
      return 0;
    }
    return DefWindowProc(window, message, wparam, lparam);
  };   
  RegisterClass(&wc);
  HWND const window = CreateWindow(wc.lpszClassName, L"Sample",
                                   WS_OVERLAPPEDWINDOW | WS_VISIBLE,
                                   CW_USEDEFAULT, CW_USEDEFAULT,
                                   CW_USEDEFAULT, CW_USEDEFAULT,
                                   nullptr, nullptr, module, nullptr);
  MSG message;
  while (BOOL result = GetMessage(&message, 0, 0, 0))
  {
    if (-1 != result) DispatchMessage(&message);
  }
}

Figure 2 shows you what this looks like on my desktop. Notice there’s nothing unusual here. While the example provides no painting and rendering commands, the window’s client area is opaque and the composition engine adds the non-client area, the border and title bar. Applying the WS_EX_NOREDIRECTIONBITMAP extended window style to get rid of the opaque redirection surface that represents this client area is a simple matter of switching out the CreateWindow function for the CreateWindowEx function with its leading parameter that accepts extended window styles:

 

HWND const window = CreateWindowEx(WS_EX_NOREDIRECTIONBITMAP,
                                   wc.lpszClassName, L"Sample",
                                   WS_OVERLAPPEDWINDOW | WS_VISIBLE,
                                   CW_USEDEFAULT, CW_USEDEFAULT,
                                   CW_USEDEFAULT, CW_USEDEFAULT,
                                   nullptr, nullptr, module, nullptr);

A Traditional Window on the Desktop
Figure 2 A Traditional Window on the Desktop

The only things that changed are the addition of the leading argument, the WS_EX_NOREDIRECTIONBITMAP extended window style and, of course, the use of the CreateWindowEx function instead of the simpler CreateWindow function. The results on the desktop are, however, far more radical. Figure 3 shows what this looks like on my desktop. Notice the window’s client area is now completely transparent. Moving the window around will illustrate this fact. I can even have a video playing in the background and it won’t be obscured in the least. On the other hand, the entire window hit tests uniformly and the window focus isn’t lost when clicking within the client area. That’s because the subsystem responsible for hit testing and mouse input isn’t aware the client area is transparent.

A Window Without a Redirection Surface
Figure 3 A Window Without a Redirection Surface

Of course, the next question is how can you possibly render anything to the window if there’s no redirection surface to provide to the composition engine? The answer comes from the DirectComposition API and its deep integration with the DirectX Graphics Infrastructure (DXGI). It’s the same technique that powers the Windows 8.1 XAML implementation to provide incredibly high-performance composition of content within a XAML application. The Internet Explorer Trident rendering engine also uses DirectComposition extensively for touch panning and zooming, as well as CSS3 animations, transitions and transforms.

I’m just going to use it to compose a swap chain that supports transparency with premultiplied alpha values on a per-pixel basis and blend it with the rest of the desktop. Traditional DirectX applications typically create a swap chain with the DXGI factory’s CreateSwapChainForHwnd method. This swap chain is backed by a pair or collection of buffers that effectively would be swapped during presentation, allowing the application to render the next frame while the previous frame is copied. The swap chain surface the application renders to is an opaque off-screen buffer. When the application presents the swap chain, DirectX copies the contents from the swap chain’s back buffer to the window’s redirection surface. As I mentioned earlier, the composition engine eventually composes all of the redirection surfaces together to produce the desktop as a whole.

In this case, the window has no redirection surface, so the DXGI factory’s CreateSwapChainForHwnd method can’t be used. However, I still need a swap chain to support Direct3D and Direct2D rendering. That’s what the DXGI factory’s CreateSwapChainForComposition method is for. I can use this method to create a windowless swap chain, along with its buffers, but presenting this swap chain doesn’t copy the bits to the redirection surface (which doesn’t exist), but instead makes it available to the composition engine directly. The composition engine can then take this surface and use it directly and in lieu of the window’s redirection surface. Because this surface isn’t opaque, but rather its pixel format fully supports per-pixel premultiplied alpha values, the result is pixel-perfect alpha blending on the desktop. It’s also incredibly fast because there’s no unnecessary copying within the GPU and certainly no copies over the bus to system memory.

That’s the theory. Now it’s time to make it happen. DirectX is all about the essentials of COM, so I’m going to use the Windows Runtime C++ Template Library ComPtr class template for managing interface pointers. I’ll also need to include and link to the DXGI, Direct3D, Direct2D and DirectComposition APIs. The following code shows you how this is done:

#include <wrl.h>
using namespace Microsoft::WRL;
#include <dxgi1_3.h>
#include <d3d11_2.h>
#include <d2d1_2.h>
#include <d2d1_2helper.h>
#include <dcomp.h>
#pragma comment(lib, "dxgi")
#pragma comment(lib, "d3d11")
#pragma comment(lib, "d2d1")
#pragma comment(lib, "dcomp")

I normally include these in my precompiled header. In that case, I’d omit the using directive and only include that in my application’s source file.

I hate code samples where the error handling overwhelms and distracts from the specifics of the topic itself, so I’ll also tuck this away with an exception class and an HR function to check for errors. You can find a simple implementation in Figure 4, but you can decide on your own error-handling policy, of course.

Figure 4 Turning HRESULT Errors into Exceptions

struct ComException
{
  HRESULT result;
  ComException(HRESULT const value) :
    result(value)
  {}
};
void HR(HRESULT const result)
{
  if (S_OK != result)
  {
    throw ComException(result);
  }
}

Now I can start to assemble the rendering stack, and that naturally begins with a Direct3D device. I’ll run through this quickly because I’ve already described the DirectX infrastructure in detail in my March 2013 column, “Introducing Direct2D 1.1” (msdn.microsoft.com/magazine/dn198239). Here’s the Direct3D 11 interface pointer:

ComPtr<ID3D11Device> direct3dDevice;

That’s the interface pointer for the device, and the D3D11Create­Device function may be used to create the device:

HR(D3D11CreateDevice(nullptr,    // Adapter
                     D3D_DRIVER_TYPE_HARDWARE,
                     nullptr,    // Module
                     D3D11_CREATE_DEVICE_BGRA_SUPPORT,
                     nullptr, 0, // Highest available feature level
                     D3D11_SDK_VERSION,
                     &direct3dDevice,
                     nullptr,    // Actual feature level
                     nullptr));  // Device context

There’s nothing too surprising here. I’m creating a Direct3D device backed by a GPU. The D3D11_CREATE_DEVICE_BGRA_SUPPORT flag enables interoperability with Direct2D. The DirectX family is held together by DXGI, which provides common GPU resource management facilities across the various DirectX APIs. Therefore, I must query the Direct3D device for its DXGI interface:

ComPtr<IDXGIDevice> dxgiDevice;
HR(direct3dDevice.As(&dxgiDevice));

The ComPtr As method is just a wrapper for the QueryInterface method. With the Direct3D device created, I can then create the swap chain that will be used for composition. To do that, I first need to get a hold of the DXGI factory:

ComPtr<IDXGIFactory2> dxFactory;
HR(CreateDXGIFactory2(
  DXGI_CREATE_FACTORY_DEBUG,
  __uuidof(dxFactory),
  reinterpret_cast<void **>(dxFactory.GetAddressOf())));

Here, I’m opting for extra debugging information—an invaluable aid during development. The hardest part of creating a swap chain is figuring out how to describe the desired swap chain to the DXGI factory. This debugging information is immensely helpful in fine-tuning the necessary DXGI_SWAP_CHAIN_DESC1 structure:

DXGI_SWAP_CHAIN_DESC1 description = {};

This initializes the structure to all zeroes. I can then begin to fill in any interesting properties:

description.Format           = DXGI_FORMAT_B8G8R8A8_UNORM;     
description.BufferUsage      = DXGI_USAGE_RENDER_TARGET_OUTPUT;
description.SwapEffect       = DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL;
description.BufferCount      = 2;                              
description.SampleDesc.Count = 1;                              
description.AlphaMode        = DXGI_ALPHA_MODE_PREMULTIPLIED;

The particular format, a 32-bit pixel format with 8 bits for each color channel along with a premultiplied 8-bit alpha component, isn’t your only option, but provides the best performance and compatibility across devices and APIs.

The swap chain’s buffer usage must be set to allow render target output to be directed to it. This is necessary so the Direct2D device context can create a bitmap to target the DXGI surface with drawing commands. The Direct2D bitmap itself is merely an abstraction backed by the swap chain.

Composition swap chains only support the flip-sequential swap effect. This is how the swap chain relates to the composition engine in lieu of a redirection surface. In the flip model, all buffers are shared directly with the composition engine. The composition engine can then compose the desktop directly from the swap chain back buffer without additional copying. This is usually the most efficient model. It’s also required for composition, so that’s what I use. The flip model also necessarily requires at least two buffers, but doesn’t support multisampling, so BufferCount is set to two and SampleDesc.Count is set to one. This count is the number of multisamples per pixel. Setting this to one effectively disables multisampling.

Finally, the alpha mode is critical. This would normally be ignored for opaque swap chains, but in this case I really do want transparency behavior to be included. Premultiplied alpha values typically provide the best performance, and it’s also the only option supported by the flip model.

The final ingredient before I can create the swap chain is to deter­mine the desired size of the buffers. Normally, when calling the CreateSwapChainForHwnd method, I can ignore the size and the DXGI factory will query the window for the size of the client area. In this case, DXGI has no idea what I plan to do with the swap chain, so I need tell it specifically what size it needs to be. With the window created, this is a simple matter of querying the window’s client area and updating the swap chain description accordingly:

RECT rect = {};
GetClientRect(window, &rect);
description.Width  = rect.right - rect.left;  
description.Height = rect.bottom - rect.top;

I can now create the composition swap chain with this description and create a pointer to the Direct3D device. Either the Direct3D or DXGI interface pointers may be used:

ComPtr<IDXGISwapChain1> swapChain;
HR(dxFactory->CreateSwapChainForComposition(dxgiDevice.Get(),
                                            &description,
                                            nullptr, // Don’t restrict
                                            swapChain.GetAddressOf()));

Now that the swap chain is created, I can use any Direct3D or Direct2D graphics rendering code to draw the application, using alpha values as needed to create the desired transparency. There’s nothing new here, so I’ll refer you to my March 2013 column again for the specifics of rendering to a swap chain with Direct2D. Figure 5 provides a simple example if you’re following along. Just don’t forget to support per-monitor DPI awareness as I described in my February 2014 column, “Write High-DPI Apps for Windows 8.1” (msdn.microsoft.com/magazine/dn574798).

Figure 5 Drawing to the Swap Chain with Direct2D

// Create a single-threaded Direct2D factory with debugging information
ComPtr<ID2D1Factory2> d2Factory;
D2D1_FACTORY_OPTIONS const options = { D2D1_DEBUG_LEVEL_INFORMATION };
HR(D2D1CreateFactory(D2D1_FACTORY_TYPE_SINGLE_THREADED,
                     options,
                     d2Factory.GetAddressOf()));
// Create the Direct2D device that links back to the Direct3D device
ComPtr<ID2D1Device1> d2Device;
HR(d2Factory->CreateDevice(dxgiDevice.Get(),
                           d2Device.GetAddressOf()));
// Create the Direct2D device context that is the actual render target
// and exposes drawing commands
ComPtr<ID2D1DeviceContext> dc;
HR(d2Device->CreateDeviceContext(D2D1_DEVICE_CONTEXT_OPTIONS_NONE,
                                 dc.GetAddressOf()));
// Retrieve the swap chain's back buffer
ComPtr<IDXGISurface2> surface;
HR(swapChain->GetBuffer(
    0, // index
    __uuidof(surface),
    reinterpret_cast<void **>(surface.GetAddressOf())));
// Create a Direct2D bitmap that points to the swap chain surface
D2D1_BITMAP_PROPERTIES1 properties = {};
properties.pixelFormat.alphaMode = D2D1_ALPHA_MODE_PREMULTIPLIED;
properties.pixelFormat.format    = DXGI_FORMAT_B8G8R8A8_UNORM;
properties.bitmapOptions         = D2D1_BITMAP_OPTIONS_TARGET |
                                   D2D1_BITMAP_OPTIONS_CANNOT_DRAW;
ComPtr<ID2D1Bitmap1> bitmap;
HR(dc->CreateBitmapFromDxgiSurface(surface.Get(),
                                   properties,
                                   bitmap.GetAddressOf()));
// Point the device context to the bitmap for rendering
dc->SetTarget(bitmap.Get());
// Draw something
dc->BeginDraw();
dc->Clear();
ComPtr<ID2D1SolidColorBrush> brush;
D2D1_COLOR_F const brushColor = D2D1::ColorF(0.18f,  // red
                                             0.55f,  // green
                                             0.34f,  // blue
                                             0.75f); // alpha
HR(dc->CreateSolidColorBrush(brushColor,
                             brush.GetAddressOf()));
D2D1_POINT_2F const ellipseCenter = D2D1::Point2F(150.0f,  // x
                                                  150.0f); // y
D2D1_ELLIPSE const ellipse = D2D1::Ellipse(ellipseCenter,
                                           100.0f,  // x radius
                                           100.0f); // y radius
dc->FillEllipse(ellipse,
                brush.Get());
HR(dc->EndDraw());
// Make the swap chain available to the composition engine
HR(swapChain->Present(1,   // sync
                          0)); // flags

Now I can finally begin to use the DirectComposition API to bring it all together. While the Windows composition engine deals with rendering and composition of the desktop as a whole, the DirectComposition API allows you to use this same technology to compose the visuals for your applications. Applications compose together different elements, called visuals, to produce the appearance of the application window itself. These visuals can be animated and transformed in a variety of ways to produce rich and fluid UIs. The composition process itself is also performed along with the composition of the desktop as a whole, so more of your application’s presentation is taken off your application threads for improved responsiveness.

DirectComposition is primarily about composing different bitmaps together. As with Direct2D, the concept of a bitmap here is more of an abstraction to allow different rendering stacks to cooperate and produce smooth and appealing application UXes.

Like Direct3D and Direct2D, DirectComposition is a DirectX API that’s backed and powered by the GPU. A DirectComposition device is created by pointing back to the Direct3D device, in much the same way a Direct2D device points back to the underlying Direct3D device. I use the same Direct3D device I previously used to create the swap chain and Direct2D render target to create a DirectComposition device:

ComPtr<IDCompositionDevice> dcompDevice;
HR(DCompositionCreateDevice(
   dxgiDevice.Get(),
   __uuidof(dcompDevice),
   reinterpret_cast<void **>(dcompDevice.GetAddressOf())));

The DCompositionCreateDevice function expects the Direct3D device’s DXGI interface and returns an IDCompositionDevice interface pointer to the newly created DirectComposition device. The DirectComposition device acts as a factory for other DirectComposition objects and provides the all-important Commit method that commits the batch of rendering commands to the composition engine for eventual composition and presentation.

Next up, I need to create a composition target that associates the visuals that will be composed with the destination, which is the application window:

ComPtr<IDCompositionTarget> target;
HR(dcompDevice->CreateTargetForHwnd(window,
                                    true, // Top most
                                    target.GetAddressOf()));

The CreateTargetForHwnd method’s first parameter is the window handle returned by the CreateWindowEx function. The second parameter indicates how the visuals will be combined with any other window elements. The result is an IDCompositionTarget interface pointer whose only method is called SetRoot. It lets me set the root visual in a possible tree of visuals to be composed together. I don’t need a whole visual tree, but I need at least one visual object, and for that I can once again turn to the DirectComposition device:

ComPtr<IDCompositionVisual> visual;
HR(dcompDevice->CreateVisual(visual.GetAddressOf()));

The visual contains a reference to a bitmap and provides a set of properties that affect how that visual will be rendered and composed relative to other visuals in the tree and the target itself. I already have the content I want this visual to carry forward to the composition engine. It’s the swap chain I created earlier:

HR(visual->SetContent(swapChain.Get()));

The visual is ready and I can simply set it as the root of the composition target:

HR(target->SetRoot(visual.Get()));

Finally, once the shape of the visual tree has been established, I can simply inform the composition engine that I’m done by calling the Commit method on the DirectComposition device:

HR(dcompDevice->Commit());

For this particular application, where the visual tree doesn’t change, I only need to call Commit once at the beginning of the application and never again. I originally assumed the Commit method needed to be called after presenting the swap chain, but this isn’t the case because swap chain presentation isn’t synchronized with changes to the visual tree.

Figure 6 shows what the application window looks like now that Direct2D has rendered to the swap chain and Direct­Composition has provided the partly transparent swap chain to the composition engine.

Direct2D Drawing on a DirectComposition Surface
Figure 6 Direct2D Drawing on a DirectComposition Surface

I’m excited to finally have a solution for an old problem: the ability to produce high-performance windows that are alpha blended with the rest of the desktop. I’m excited about the possibilities the DirectComposition API enables and what it means for the future of application UX design and development in native code.

Want to draw your own window chrome? No problem; just replace the WS_OVERLAPPEDWINDOW window style with the WS_POPUP window style when creating the window. Happy coding!


Kenny Kerr is a computer programmer based in Canada, as well as an author for Pluralsight and a Microsoft MVP. He blogs at kennykerr.ca and you can follow him on Twitter at twitter.com/kennykerr.

Thanks to the following technical expert for reviewing this article: Leonardo Blanco (Microsoft)