December 2009

Volume 24 Number 12

Windows with C++ - Layered Windows with Direct2D

By Kenny Kerr | December 2009

In my third installment on Direct2D, I’m going to show off some of its unmatched power when it comes to interoperability. Instead of exhaustively detailing all the various interoperability options that Direct2D provides, I’m going to walk you through a practical application: layered windows. Layered windows are one of those Windows features that have been around for a long time but haven’t evolved much and thus require special care to use effectively with modern graphics technologies.

In this article I’m going to assume you have a basic familiarity with Direct2D programming. If not, I recommend you read my previous articles from the June ( and September ( issues that introduced the fundamentals of programming and drawing with Direct2D.

Originally, layered windows served a few different purposes. In particular, they could be used to easily and efficiently produce visual effects and flicker-free rendering. In the days when GDI was the predominant method for producing graphics, this was a real bonus. In today’s hardware-accelerated world, however, it is no longer compelling because layered windows still belong to the world of User32/GDI and have not been updated in any significant way to support DirectX, the Microsoft platform for high-performance and high-quality graphics.

Layered windows do provide the unique ability to compose a window on the desktop using per-pixel alpha blending, which cannot be achieved in any other way with the Windows SDK.

I should mention that there are really two types of layered window. The distinction comes down to whether you need per-pixel opacity control or you just need to control the opacity of the window as a whole. This article is about the former, but if you really just need to control the opacity of a window, you can do so by simply calling the SetLayeredWindowAttributes function after creating the window to set the alpha value.

  0, // no color key
  180, // alpha value

This assumes you’ve created the window with the WS_EX_LAYERED extended style or applied it after the fact using the SetWindowLong function. Figure 1 provides an example of such a window. The benefit should be obvious: you don’t need to change anything about the way your application paints the window as the Desktop Window Manager (DWM) will automatically blend the window appropriately. On the flip side, you need to draw absolutely everything yourself. Of course, if you’re using a brand-new rendering technology such as Direct2D, that’s not a problem!

Window with Alpha Value

Figure 1 Window with Alpha Value

So what’s involved? Well, at a fundamental level it is straightforward. First, you need to fill in an UPDATELAYEREDWINDOWINFO structure. This structure provides the position and size of a layered window as well as a GDI device context (DC) that defines the surface of the window—and therein lies the problem. DCs belong to the old world of GDI and are far from the world of DirectX and hardware acceleration. More on that in a moment.

Besides being full of pointers to structures that you need to allocate yourself, the UPDATELAYEREDWINDOWINFO structure isn’t fully documented in the Windows SDK, making it less than obvious to use. In all, you need to allocate five structures. There’s the source position identifying the location of the bitmap to copy from the DC. There’s the window position identifying where the window will be positioned on the desktop once updated. There’s the size of the bitmap to copy, which also defines the size of the window:

POINT sourcePosition = {};
POINT windowPosition = {};
SIZE size = { 600, 400 };

Then there’s the BLENDFUNCTION structure that defines how the layered window will be blended with the desktop. This is a surprisingly versatile structure that is often overlooked, but can be quite helpful. Normally you might populate it as follows:

blend.SourceConstantAlpha = 255;
blend.AlphaFormat = AC_SRC_ALPHA;

The AC_SRC_ALPHA constant just indicates that the source bitmap has an alpha channel, which is the most common scenario.

The SourceConstantAlpha, however, is interesting in that you can use it in much the same way you might use the SetLayeredWindowAttributes function to control the opacity of the window as a whole. When it is set to 255, the layered window will just use the per-pixel alpha values, but you can adjust it all the way to zero, or fully transparent, to produce effects such as fading the window in or out without the cost of redrawing. It should now be obvious why the BLENDFUNCTION structure is named as it is: the resulting alpha-blended window is a function of this structure’s value.

Last, there’s the UPDATELAYEREDWINDOWINFO structure that ties it all together:

info.pptSrc = &sourcePosition;
info.pptDst = &windowPosition;
info.psize = &size;
info.pblend = &blend;
info.dwFlags = ULW_ALPHA;

This should be pretty self-explanatory at this point, with the only undocumented member being the dwFlags variable. A value of ULW_ALPHA, which should look familiar if you’ve used the older UpdateLayeredWindow function before, just indicates that the blend function should be used.

Finally, you need to provide the handle to the source DC and call the UpdateLayeredWindowIndirect function to update the window:

info.hdcSrc = sourceDC;

  windowHandle, &info));

And that’s it. The window won’t receive any WM_PAINT messages. Any time you need to show or update the window, just call the UpdateLayeredWindowIndirect function. To keep all of this boilerplate code out of the way, I’m going to use the LayeredWindowInfo wrapper class shown in Figure 2 in the rest of this article.

Figure 2 LayeredWindowInfo Wrapper Class

class LayeredWindowInfo {
  const POINT m_sourcePosition;
  POINT m_windowPosition;
  CSize m_size;


    __in UINT width,
    __in UINT height) :
    m_size(width, height),
    m_info() {

      m_info.cbSize = sizeof(UPDATELAYEREDWINDOWINFO);
      m_info.pptSrc = &m_sourcePosition;
      m_info.pptDst = &m_windowPosition;
      m_info.psize = &m_size;
      m_info.pblend = &m_blend;
      m_info.dwFlags = ULW_ALPHA;

      m_blend.SourceConstantAlpha = 255;
      m_blend.AlphaFormat = AC_SRC_ALPHA;

  void Update(
    __in HWND window,
    __in HDC source) {

    m_info.hdcSrc = source;

    Verify(UpdateLayeredWindowIndirect(window, &m_info));

  UINT GetWidth() const { return; }

  UINT GetHeight() const { return; }

Figure 3provides a basic skeleton for a layered window using ATL/WTL and the LayeredWindowInfo wrapper class from Figure 2. This first thing to notice is that there’s no need to call UpdateWindow since this code doesn’t use WM_PAINT. Instead it immediately calls the Render method, which in turn is required to perform some drawing and to provide a DC to LayeredWindowInfo’s Update method. How that drawing occurs and where the DC comes from is where it gets interesting.

Figure 3 Layered Window Skeleton

class LayeredWindow :
  public CWindowImpl<LayeredWindow, 
  CWindow, CWinTraits<WS_POPUP, WS_EX_LAYERED>> {

  LayeredWindowInfo m_info;



  LayeredWindow() :
    m_info(600, 400) {

    Verify(0 != __super::Create(0)); // parent

  void Render() {
    // Do some drawing here

      /* source DC goes here */);

  void OnDestroy() {

The GDI/GDI+ Way

I’ll first show you how it was done in GDI/GDI+. First you need to create a pre-multiplied 32-bits-per-pixel (bpp) bitmap using a blue-green-red-alpha (BGRA) color channel byte order. Pre-multiplied just means that the color channel values have already been multiplied by the alpha value. This tends to provide better performance for alpha blending images, but it means you need to reverse the process by dividing the color values by the 
alpha value to get their true color values. In GDI terminology, this is called a 32-bpp device-independent bitmap (DIB) and is created by filling out a BITMAPINFO structure and passing it to the CreateDIBSection function (see Figure 4).

Figure 4 Creating a DIB

BITMAPINFO bitmapInfo = {};
bitmapInfo.bmiHeader.biSize = 
bitmapInfo.bmiHeader.biWidth = 
bitmapInfo.bmiHeader.biHeight = 
  0 – m_info.GetHeight();
bitmapInfo.bmiHeader.biPlanes = 1;
bitmapInfo.bmiHeader.biBitCount = 32;
bitmapInfo.bmiHeader.biCompression = 

void* bits = 0;

CBitmap bitmap(CreateDIBSection(
  0, // no DC palette
  0, // no file mapping object
  0)); // no file offset

There are a lot of details here, but they aren’t relevant to the discussion. This API function goes back a long way. What you should take note of is that I’ve specified a negative height for the bitmap. The BITMAPINFOHEADER structure defines either a bottom-up or a top-down bitmap. If the height is positive you’ll end up with a bottom-up bitmap, and if it’s negative you’ll get a top-down bitmap. Top-down bitmaps have their origin in the upper-left corner, whereas bottom-down bitmaps have their origin in the lower-left corner.

Although not strictly necessary in this case, I tend to use top-down bitmaps as that is the format used by most of the modern imaging components in Windows and thus improves interoperability. This also leads to a positive stride, which can be calculated as follows:

UINT stride = (width * 32 + 31) / 32 * 4;

At this point you have enough information to start drawing in the bitmap through the bits pointer. Of course, unless you’re completely insane you’ll want to use some drawing functions, but unfortunately most of those provided by GDI don’t support the alpha channel. That’s where GDI+ comes in.

Although you could pass the bitmap data directly to GDI+, let’s instead create a DC for it since you’ll need it anyway to pass to the UpdateLayeredWindowIndirect function. To create the DC, call the aptly named CreateCompatibleDC function, which creates a memory DC that is compatible with the desktop. You can then call the SelectObject function to select the bitmap into the DC. The GdiBitmap wrapper class in Figure 5 wraps all of this up and provides some extra housekeeping.

Figure 5 DIB Wrapper Class

class GdiBitmap {
  const UINT m_width;
  const UINT m_height;
  const UINT m_stride;
  void* m_bits;
  HBITMAP m_oldBitmap;

  CDC m_dc;
  CBitmap m_bitmap;


  GdiBitmap(__in UINT width,
            __in UINT height) :
    m_stride((width * 32 + 31) / 32 * 4),
    m_oldBitmap(0) {

    BITMAPINFO bitmapInfo = { };
    bitmapInfo.bmiHeader.biSize = 
    bitmapInfo.bmiHeader.biWidth = 
    bitmapInfo.bmiHeader.biHeight = 
      0 - height;
    bitmapInfo.bmiHeader.biPlanes = 1;
    bitmapInfo.bmiHeader.biBitCount = 32;
    bitmapInfo.bmiHeader.biCompression = 

      0, // device context
      0, // file mapping object
      0)); // file offset
    if (0 == m_bits) {
      throw bad_alloc();

    if (0 == m_dc.CreateCompatibleDC()) {
      throw bad_alloc();

    m_oldBitmap = m_dc.SelectBitmap(m_bitmap);

  ~GdiBitmap() {

  UINT GetWidth() const {
    return m_width;

  UINT GetHeight() const {
    return m_height;

  UINT GetStride() const {
    return m_stride;

  void* GetBits() const {
    return m_bits;

  HDC GetDC() const {
    return m_dc;

The GDI+ Graphics class, which provides methods for drawing to some device, can be constructed with the bitmap’s DC. Figure 6 shows how the LayeredWindow class from Figure 3 can be updated to support rendering with GDI+. Once you have all of the boilerplate GDI code out of the way, it’s quite straightforward. The window’s size is passed to the GdiBitmap constructor and the bitmap’s DC is passed to the Graphics constructor and the Update method. Although straightforward, neither GDI nor GDI+ are hardware-accelerated (for the most part), nor do they provide particularly powerful rendering functionality.

Figure 6 GDI Layered Window

class LayeredWindow :
  public CWindowImpl< ... {

  LayeredWindowInfo m_info;
  GdiBitmap m_bitmap;
  Graphics m_graphics;

  LayeredWindow() :
    m_info(600, 400),
    m_bitmap(m_info.GetWidth(), m_info.GetHeight()),
    m_graphics(m_bitmap.GetDC()) {

  void Render() {
    // Do some drawing with m_graphics object


The Architecture Problem

By contrast, this is all it takes create a layered window with Windows Presentation Foundation (WPF):

class LayeredWindow : Window {
  public LayeredWindow() {
    WindowStyle = WindowStyle.None;
    AllowsTransparency = true;

    // Do some drawing here

Although incredibly simple, it belies the complexity involved and the architectural limitations of using layered windows. No matter how you sugarcoat it, layered windows must follow the architectural principles outlined thus far in this article. Although WPF may be able to use hardware acceleration for its rendering, the results still need to be copied to a pre-multiplied BGRA bitmap selected into a compatible DC before the display is updated via a call to the UpdateLayeredWindowIndirect function. Since WPF is not exposing anything more than a bool variable, it has to make certain choices on your behalf that you have no control over. Why does that matter? It comes down to hardware.

A graphics processing unit (GPU) prefers dedicated memory to achieve the best performance. This means that if you need to manipulate an existing bitmap, it needs to be copied from system memory (RAM) to GPU memory, which tends to be much slower than copying between two locations in system memory. The converse is also true: if you create and render a bitmap using the GPU, then decide to copy it to system memory, that’s an expensive copy operation.

Normally this should not occur as bitmaps rendered by the GPU are typically sent directly to the display device. In the case of layered windows, the bitmap must travel back to system memory since User32/GDI resources involve both kernel-mode and user-mode resources that require access to the bitmap. Consider, for example, the fact that User32 needs to hit test layered windows. Hit testing of a layered window is based on the alpha values of the bitmap, allowing mouse messages through if the pixel at a particular point is transparent. As a result, a copy of the bitmap is required in system memory to allow this to happen. Once the bitmap has been copied by UpdateLayeredWindowIndirect, it is sent straight back to the GPU so the DWM can compose the desktop.

Besides the expense of copying memory back and forth, forcing the GPU to synchronize with the CPU is costly as well. Unlike typical CPU-bound operations, GPU operations tend to all be performed asynchronously, which provides great performance when batching a stream of rendering commands. Every time we need to cross paths with the CPU, it forces batched commands to be flushed and the CPU to wait until the GPU has completed, leading to less than optimal performance.

This all means that you need to be careful about these roundtrips and the frequency and costs involved. If the scenes being rendered are sufficiently complex, then the performance of hardware acceleration can easily outweigh the cost of copying the bitmaps. On the other hand, if the rendering is not very costly and can be performed by the CPU, you might find that opting for no hardware acceleration will ultimately provide better performance. These choices aren’t easy to make. Some GPUs don’t even have dedicated memory and instead use a portion of system memory, which reduces the cost of the copy.

The catch is that neither GDI nor WPF give you a choice. In the case of GDI, you’re stuck with the CPU. In the case of WPF, you’re forced into using whatever rendering approach WPF uses, which is typically hardware acceleration via Direct3D.

Then Direct2D came along.

Direct2D to GDI/DC

Direct2D was designed to render to whatever target you choose. If it’s a window or Direct3D texture, Direct2D does this directly on the GPU without involving any copying. If it’s a Windows Imaging Component (WIC) bitmap, Direct2D similarly renders directly using the CPU instead. Whereas WPF strives to put much of its rendering on the GPU and uses a software rasterizer as a fallback, Direct2D provides the best of both worlds with unparalleled immediate mode rendering on the GPU for hardware acceleration, and highly optimized rendering on the CPU when a GPU is either not available or not desired.

As you can imagine, there are quite a few ways to render a layered window with Direct2D. Let’s take a look at a few and I’ll point out the recommended approaches depending on whether you want to use hardware acceleration.

First, you could just rip out the GDI+ Graphics class from Figure 3 and replace it with a Direct2D DC render target. This might make sense if you have a legacy application with a lot invested in GDI, but it’s definitely not the most efficient solution. Instead of rendering directly to the DC, Direct2D renders first to an internal WIC bitmap, then copies the result to the DC. Although faster than GDI+, this nevertheless involves extra copying that could be avoided if you didn’t need to use a DC for rendering.

To use this approach, start by initializing a D2D1_RENDER_TARGET_PROPERTIES structure. This tells Direct2D the format of the bitmap to use for its render target. Recall that it needs to be a pre-multiplied BGRA pixel format. This is expressed with a D2D1_PIXEL_FORMAT structure and can be defined as follows:

const D2D1_PIXEL_FORMAT format = 

const D2D1_RENDER_TARGET_PROPERTIES properties = 

You can now create the DC render target using the Direct2D factory object:

CComPtr<ID2D1DCRenderTarget> target;


Finally, you need to tell the render target to which DC to send its drawing commands:

const RECT rect = {0, 0, bitmap.GetWidth(), bitmap.GetHeight()};

Verify(target->BindDC(bitmap.GetDC(), &rect));

At this point you can draw with Direct2D as usual between BeginDraw and EndDraw method calls, and then call the Update method as before with the bitmap’s DC. The EndDraw method ensures that all drawing has been flushed to the bound DC.

Direct2D to WIC

Now if you can avoid the GDI DC entirely and just use a WIC bitmap directly, you can achieve the best possible performance without hardware acceleration. To use this approach start by creating a pre-multiplied BGRA bitmap directly with WIC:

CComPtr<IWICImagingFactory> factory;

CComPtr<IWICBitmap> bitmap;


Next you need to once again initialize a D2D1_RENDER_TARGET_PROPERTIES structure in much the same way as before, except that you must also tell Direct2D that the render target needs to be GDI-compatible:

const D2D1_PIXEL_FORMAT format = 

const D2D1_RENDER_TARGET_PROPERTIES properties = 
  0.0f, // default dpi
  0.0f, // default dpi

You can now create the WIC render target using the Direct2D factory object:

CComPtr<ID2D1RenderTarget> target;


But what exactly does D2D1_RENDER_TARGET_USAGE_GDI_COMPATIBLE do? It’s a hint to Direct2D that you will query the render target for the ID2D1GdiInteropRenderTarget interface:

CComPtr<ID2D1GdiInteropRenderTarget> interopTarget;

For simplicity and efficiency of implementation, querying for this interface will always succeed. It is only when you try to use it, however, that it will fail if you didn’t specify your desires up front.

The ID2D1GdiInteropRenderTarget interface has just two methods: GetDC and ReleaseDC. To optimize cases where hardware acceleration is used, these methods are restricted to being used between calls to the render target’s BeginDraw and EndDraw methods. GetDC will flush the render target before returning the DC. Since the interop interface’s methods need to be paired, it makes sense to wrap them in a C++ class as shown in Figure 7.

Figure 7 Render Target DC Wrapper Class

class RenderTargetDC {
  ID2D1GdiInteropRenderTarget* m_renderTarget;
  HDC m_dc;

  RenderTargetDC(ID2D1GdiInteropRenderTarget* renderTarget) :
    m_dc(0) {



  ~RenderTargetDC() {
    RECT rect = {};

  operator HDC() const {
    return m_dc;

The window’s Render method can now be updated to use the RenderTargetDC, as shown in Figure 8. The nice thing about this approach is that all of the code that is specific to creating a WIC render target is tucked away in the CreateDeviceResources method. Next I’ll show you how to create a Direct3D render target to gain hardware acceleration, but in either case, the Render method shown in Figure 8 stays the same. This makes it possible for your application to fairly easily switch render target implementations without changing all your drawing code.

Figure 8 GDI-Compatible Render Method

void Render() {
  // Do some drawing here
    RenderTargetDC dc(m_interopTarget);
    m_info.Update(m_hWnd, dc);

  const HRESULT hr = m_target->EndDraw();

  else {

Direct2D to Direct3D/DXGI

To obtain hardware-accelerated rendering, you need to use Direct3D. Because you’re not rendering directly to an HWND via ID2D1HwndRenderTarget, which would gain hardware acceleration automatically, you need to create the Direct3D device yourself and connect the dots in the underlying DirectX Graphics Infrastructure (DXGI) so that you can get GDI-compatible results.

DXGI is a relatively new subsystem that lives on a layer below Direct3D to abstract Direct3D from the underlying hardware and provide a high-performance gateway for interop scenarios. Direct2D also takes advantage of this new API to simplify the move to future versions of Direct3D. To use this approach, start by creating a Direct3D hardware device. This is the device that represents the GPU that will perform the rendering. Here I’m using the Direct3D 10.1 API as this is required by Direct2D at present:

CComPtr<ID3D10Device1> device;

  0, // adapter
  0, // reserved

The D3D10_CREATE_DEVICE_BGRA_SUPPORT flag is crucial for Direct2D interoperability, and the BGRA pixel format should by now look familiar. In a traditional Direct3D application, you might create a swap chain and retrieve its back buffer as a texture to render into before presenting the rendered window. Since you’re using Direct3D for rendering only and not for presentation, you can simply create a texture resource directly. A texture is a Direct3D resource for storing texels, which are the Direct3D equivalent of pixels. Although Direct3D provides 1-, 2- and 3-dimensional textures, all you need is a 2D texture, which most closely maps to a 2D surface (see Figure 9).

Figure 9 A 2D Texture

D3D10_TEXTURE2D_DESC description = {};
description.ArraySize = 1;
description.BindFlags = 
description.Format = 
description.Width = GetWidth();
description.Height = GetHeight();
description.MipLevels = 1;
description.SampleDesc.Count = 1;
description.MiscFlags = 

CComPtr<ID3D10Texture2D> texture;

  0, // no initial data

The D3D10_TEXTURE2D_DESC structure describes the texture to create. The D3D10_BIND_RENDER_TARGET constant indicates that the texture is bound as the output buffer, or render target, of the Direct3D pipeline. The DXGI_FORMAT_B8G8R8A8_UNORM constant ensures that Direct3D will produce the correct pixel format for GDI. Finally, the D3D10_RESOURCE_MISC_GDI_COMPATIBLE constant instructs the underlying DXGI surface to offer a GDI DC through which the results of rendering can be obtained. This Direct2D exposes through the ID2D1GdiInteropRenderTarget interface I discussed in the previous section.

As I mentioned, Direct2D is capable of rendering to a Direct3D surface via the DXGI API to avoid tying the API to any particular version of Direct3D. This means you need to get the Direct3D texture’s underlying DXGI surface interface to pass to Direct2D:

CComPtr<IDXGISurface> surface;

At this point you can use the Direct2D factory object to create a DXGI surface render target:

CComPtr<ID2D1RenderTarget> target;


The render target properties are the same as those I described in the previous section. Just remember to use the correct pixel format and request GDI compatibility. You can then query for the ID2D1GdiInteropRenderTarget interface and use the same Render method from Figure 8.

And that’s all there is to it. If you want to render your layered window with hardware acceleration, use a Direct3D texture. Otherwise use a WIC bitmap. These two approaches will provide the best possible performance with the least amount of copying.

Be sure to check out the DirectX blog and, in particular, Ben Constable’s August 2009 article on componentization and interoperability at

About the Author