Article
07/01/2015

July 2014

Volume 29 Number 7

DirectX Factor : Breaking the Z Barrier with Direct2D Effects

Charles Petzold As children, we learn how to paint before we learn how to read and write, and we undoubtedly glean a few lessons from the experience. We discover the temporal process of painting is reflected in the layering of paint on the canvas. What we paint earlier may be partially covered up and obscured by what we paint later.

For that reason, even someone completely unfamiliar with the mechanics of computer graphics can probably guess how the image in Figure 1 was rendered: Obviously, the background was colored gray first; next came the blue triangle, followed by the green, and last by the red, which is in front of everything else. It’s no surprise the process of rendering figures from rear to front is known as the “painter’s algorithm.”

Figure 1 Three Overlapping Triangles

The three triangles in Figure 1 might also be pieces of colored construction paper arranged in a stack. If you were to add more and more triangles to the stack, they would build up into a pile, and what started out as a two-dimensional surface would acquire a third dimension.

Even in 2D graphics, there’s a rudimentary concept of a Z axis—a virtual space orthogonal to the two-dimensional screen or canvas. The layering of flat 2D objects is governed by the “Z order” of the figures. In XAML-based environments, for example, the Canvas.ZIndex attached property determines what elements seem to sit on top of others, but it really just controls the order the elements are rendered on the screen.

The problem is this: In 2D graphics, a Z index always applies to the entire figure. You can’t use this type of Z ordering to draw three figures like those in Figure 2, with the first on top of the second, the second on top of the third, but the third on top of the first.

Figure 2 Mutually Overlapping Triangles

That change in one corner of one triangle seems slight, but what a world of difference it represents! The image in Figure 2 might be easy with construction paper, but not so easy with painting—either in real life or in 2D graphics programming. One of these triangles needs to be rendered in two parts with carefully calculated coordinates, or using clipping based on one of the other triangles.

Effects and the GPU

The rendering of Figure 2 can be helped enormously by borrowing some concepts from the world of 3D graphics.

Such figures can’t have uniform Z indices; instead, the figures must be allowed to have variable Z coordinates over their entire surfaces. The drawing process can then maintain a collection of Z coordinates (called a Z buffer or a depth buffer) that encompasses every pixel of the rendering surface. As each figure is rendered, the Z coordinate of each pixel of the figure is compared with the corresponding Z coordinate in this depth buffer. If the pixel is on top of the Z coordinate in the depth buffer, the pixel is drawn and the new Z coordinate is stored in the depth buffer. If not, the pixel is ignored.

This sounds computationally costly—not only for the comparison itself but for the calculation of Z coordinates for every pixel of each graphical figure—and that’s an accurate evaluation. That’s why it’s an ideal job to hand over to the parallel computational capabilities of the modern GPU.

The calculation of Z coordinates for every pixel of a figure is conceptually rather easy if the figure happens to be a triangle—and keep in mind every polygon can be decomposed into triangles. All that’s necessary is to give each of the three vertices of the triangle a 3D coordinate point, and then any point within the triangle can be calculated as a weighted average of the three vertex coordinates. (This involves barycentric coordinates—not the only concept used in computer graphics developed by German mathematician August Ferdinand Möbius.)

That same interpolation process can shade the triangle, as well. If each vertex is assigned a specific color, then any pixel within that triangle is a weighted average of those three colors, as shown in Figure 3.

Figure 3 The ThreeTriangles Program Display

That type of color gradient is also an important feature of 3D programming because it allows triangles to be shaded to resemble curved surfaces. But it’s not a type of gradient that’s common in conventional 2D programming.

The image in Figure 3 was created by a downloadable program called ThreeTriangles that runs under Windows 8.1 and Windows Phone 8.1. (The solution was created in Visual Studio 2013 Update 2 using the new Universal App template that allows sharing lots of code between Windows 8.1 and Windows Phone 8.1.)

The graphics in the ThreeTriangles program are done entirely in Direct2D, using a feature of Direct2D called effects or (when you code them yourself) custom effects. Using custom effects you can get much closer to authentic 3D programming than otherwise possible with Direct2D.

When writing a custom effect for Direct2D, you acquire a privilege normally restricted to 3D programmers: You can write code that executes on the GPU. This code takes the form of little programs called shaders, which you write using a high-level shading language (HLSL) that resembles C. These shaders are compiled by Visual Studio into compiled shader object (.cso) files during the normal project build and then run on the GPU when the program executes.

Indeed, Direct2D effects are sometimes described as little more than wrappers for shaders! The custom effects are the only way you can use shaders within the context of Direct2D programming to achieve 3D-like images.

Three different types of shaders are available for use by Direct2D effects:

A vertex shader that performs operations on vertices. Each triangle has three vertices. The vertices always involve a coordinate point but might include other information, such as color.
A pixel shader that performs operations on all the pixels within these triangles. Any information supplied with the vertices is automatically interpolated over the surface of the triangle in preparation for the pixel shader.
A compute shader that uses the GPU to perform heavy parallel processing. I won’t be discussing the compute shader in this article.

The shaders used for Direct2D effects have somewhat different requirements than shaders associated with Direct3D programming, but many of the concepts are the same.

Built-in Effects and Custom Effects

Direct2D includes about 40 predefined built-in effects, which perform various image-processing manipulations on bitmaps, such as blur or sharpen, or various types of color manipulation.

Each of these built-in effects is identified by a class ID you use to create an effect of that type. For example, suppose you want to use the color-matrix effect, which allows specifying a transform to alter the colors in a bitmap. You’ll probably declare an object of type ID2D1Effect as a private field in your rendering class:

Microsoft::WRL::ComPtr<ID2D1Effect> m_colorMatrixEffect;

In the CreateDeviceDependentResources method, you can create this effect by referencing the documented class ID:

d2dContext->CreateEffect(
  CLSID_D2D1ColorMatrix, &m_colorMatrixEffect);

At that point, you can call SetInput on the effect object to set a bitmap, and SetValue to specify a transform matrix. You render this color-shifted bitmap by calling:

d2dContext->DrawImage(m_colorMatrixEffect.Get());

All the built-in effects involve bitmap input, and one of the features of Direct2D effects is that you can chain them together to apply a series of effects to a bitmap.

If you’re interested in writing your own custom effects, there’s an invaluable Windows 8.1 Visual Studio solution called Direct2D Custom Image Effects Sample that includes three separate projects to demonstrate the three types of shaders available for Direct2D effects. All three programs require bitmaps as input.

Therefore, you’d be forgiven for assuming that Direct2D effects always perform operations on bitmap input. But this isn’t so. The ThreeTriangles program that created the image in Figure 3 doesn’t require bitmap input.

You’d also be forgiven for assuming Direct2D effects involve just one type of shader. Certainly, the built-in effects seem to involve either a vertex shader or a pixel shader, but not both. However, the ThreeTriangles program is different in this respect, as well: It defines a custom effect that uses both a vertex shader and a pixel shader.

Register, Create, Draw

Because Direct2D effects are designed to be preregistered and created from a class ID, a custom effect needs to offer that same ability. The custom effect in the ThreeTriangles program is a class named SimpleTriangleEffect, which defines a static method for registering the class. This method is called by the constructor of the ThreeTrianglesRenderer class, but the effect could be registered anywhere in the program:

SimpleTriangleEffect::RegisterEffectAsync(d2dFactory)

This registration method is asynchronous because it needs to load in the compiled shader files, and the only method provided for this purpose in the DirectXHelper class is ReadDataAsync.

Just like when using a built-in effect, the ThreeTrianglesRenderer class declares an ID2D1Effect object as a private field in its header file:

Microsoft::WRL::ComPtr<ID2D1Effect> m_simpleTriangleEffect;

The CreateDeviceDependentResources method creates the custom effect the same way as a built-in effect:

d2dContext->CreateEffect(
   CLSID_SimpleTriangleEffect, &m_simpleTriangleEffect)

The earlier registration of the custom effect associated that class ID with the effect.

The SimpleTriangleEffect has no input. (That’s part of what makes it “simple”!) The effect is rendered just like a built-in effect:

d2dContext->DrawImage(m_simpleTriangleEffect.Get());

Perhaps the simple use of this custom effect suggests some of the complexity within the effect class itself. A custom effect such as SimpleTriangleEffect must implement the ID2D1EffectImpl (effect implementation) interface. An effect can consist of multiple passes, which are called transforms, and each one is usually represented by an implementation of ID2D1DrawTransform. If a single class is used for both interfaces—which is the case with SimpleTriangleEffect—then it needs to implement IUnknown (three methods), ID2D1EffectImpl (three methods), ID2D1TransformNode (one method), ID2D1Transform (three methods), and ID2D1DrawTransform (1 method).

That’s a considerable amount of overhead, in addition to some XML that identifies the effect and its author when the effect is first registered. Fortunately, for simple effects—and this one certainly qualifies—many of the effect methods can have fairly easy implementations. The most important jobs of the effect class involve loading and registering the compiled shader code (and associating the shaders with GUIDs for later reference), and defining a vertex buffer, which must also be associated with a GUID.

From Vertex Buffer …

A vertex buffer is a collection of vertices assembled for processing. Each vertex always includes a 2D or 3D coordinate point, but usually other items as well. The data associated with each vertex and how it’s organized is called the “layout” of the vertex buffer and, overall, the ThreeTriangles program defines three different—but equivalent—data types to describe this vertex layout.

The first representation of this vertex data is shown in Figure 4. This is a simple structure named Vertex that includes a 3D coordinate point and an RGB color. An array of these structures defines the three triangles displayed by the program. (This array is hardcoded in the required Initialize method of the SimpleTriangleEffect class; in a real program the effect class would allow an array of vertices to be input to the effect.)

Figure 4 Vertex Definition in SimpleTriangleEffect

// Define Vertex for simple initialization
struct Vertex
{
  float x;
  float y;
  float z;
  float r;
  float g;
  float b;
};
// Each triangle has three points and three colors
static Vertex vertices [] =
{
  // Triangle 1
  {    0, -1000, 0.0f, 1, 0, 0 },
  {  985,  -174, 0.5f, 0, 1, 0 },
  {  342,   940, 1.0f, 0, 0, 1 },
  // Triangle 2
  {  866,   500, 0.0f, 1, 0, 0 },
  { -342,   940, 0.5f, 0, 1, 0 },
  { -985,  -174, 1.0f, 0, 0, 1 },
  // Triangle 3
  { -866,   500, 0.0f, 1, 0, 0 },
  { -643,  -766, 0.5f, 0, 1, 0 },
  {  643,  -766, 1.0f, 0, 0, 1 }
};
// Define layout for the effect
static const D2D1_INPUT_ELEMENT_DESC vertexLayout [] =
{
  { "MESH_POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0 },
  { "COLOR", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 12 },
};

The x and y values are based on sines and cosines of angles in increments of 40 degrees, with a radius of 1,000. However, notice that the z coordinates are all set between 0 and 1, so that the red vertices have a z value of 1, the green vertices are 0.5, and the blue vertices are 0. More on this a little later.

Following that array is another little array, but this one defines the vertex information in a more formal manner required for creating and registering the vertex buffer.

The vertex buffer and the vertex shader are both referenced in the SetDrawInfo method of SimpleTriangleEffect. Every time the effect is rendered, these nine vertices are passed to the vertex shader.

... To Vertex Shader …

Figure 5 shows the vertex shader for the SimpleTriangleEffect. It consists of three structures and a function called main. The main function is called for every vertex in the vertex buffer; in this case, that’s only nine vertices, but often there are many more.

Figure 5 The SimpleTriangleEffectVertexShader.hlsl File

// Per-vertex data input to the vertex shader
struct VertexShaderInput
{
  float3 position : MESH_POSITION;
  float3 color : COLOR0;
};
// Per-vertex data output from the vertex shader
struct VertexShaderOutput
{
  float4 clipSpaceOutput : SV_POSITION;
  float4 sceneSpaceOutput : SCENE_POSITION;
  float3 color : COLOR0;
};
// Information provided for Direct2D vertex shaders
cbuffer ClipSpaceTransforms : register(b0)
{
  float2x1 sceneToOutputX;
  float2x1 sceneToOutputY;
}
// Called for each vertex
VertexShaderOutput main(VertexShaderInput input)
{
  // Output structure
  VertexShaderOutput output;
  // Append a 'w' value of 1 to the 3D input position
  output.sceneSpaceOutput = float4(input.position.xyz, 1);
  // Standard calculations
  output.clipSpaceOutput.x =
    output.sceneSpaceOutput.x * sceneToOutputX[0] +
    output.sceneSpaceOutput.w * sceneToOutputX[1];
  output.clipSpaceOutput.y =
    output.sceneSpaceOutput.y * sceneToOutputY[0] +
    output.sceneSpaceOutput.w * sceneToOutputY[1];
  output.clipSpaceOutput.z = output.sceneSpaceOutput.z;
  output.clipSpaceOutput.w = output.sceneSpaceOutput.w;
  // Transfer the color
  output.color = input.color;
  return output;
}

Each of the three structures contains fields identified with an HLSL data type, a member name and uppercase semantics that identify the role of the particular field.

The structure named VertexShaderInput is the input to main, and it’s the same as the layout of the vertex buffer you’ve just seen, but with HLSL data types for the 3D position and the RGB color.

The structure named VertexShaderOutput defines the output of main. The first two fields are required for Direct2D effects. (A third required field would be present if the effect involved an input bitmap.) The field I’ve called sceneSpaceOutput is based on the input coordinate. Some effects change that coordinate; this effect does not, and simply turns the 3D input coordinate into a 4D homogenous coordinate with a w value of 1:

output.sceneSpaceOutput = float4(input.position.xyz, 1);

The vertex shader output also includes a non-required field called color, which is simply set from the input color:

output.color = input.color;

The required output field I’ve called clipSpaceOutput describes each vertex coordinate in terms of normalized coordinates used in 3D. These coordinates are the same as coordinates generated from the camera projection transforms I described in last month’s installment of this column. In these clipSpaceOutput coordinates, x values range from –1 on the left of the screen to 1 on the right; y values range from –1 on the bottom to 1 at top; and z values range from 0 for coordinates closest to the viewer to 1 for coordinates furthest away. As the name of the field implies, these normalized coordinates are used for clipping the 3D scene to the screen.

To assist you in calculating these clipSpaceOutput coordinates, a third structure is automatically provided for you that I’ve called ClipSpaceTransforms. These are four numbers based on the pixel width and height of the screen, and any device context transforms that are in effect when DrawImage renders the effect.

However, the provided transforms are only for x and y coordinates, and that’s why I defined z coordinates in the original vertex buffer to have values between 0 and 1. Another approach is to use an actual camera projection transform in the vertex shader (as I’ll demonstrate in a future column).

These z values are also automatically used in a depth buffer so that pixels with lower z coordinates obscure pixels with higher z values. But this only occurs if the SetDrawInfo method in the effect class calls SetVertexProcessing with the D2D1_VERTEX_OPTIONS_USE_DEPTH_BUFFER flag. (This happens to also result in COM errors appearing in the Output window of Visual Studio while the program is running, but that also happens with the Microsoft sample Direct2D effect code.)

… To Pixel Shader

Every time the effect is rendered (and in the general case, that’s at the frame rate of the video display), the vertex shader is called for every vertex in the vertex buffer, in this case nine times.

The output from the vertex shader has the same format as the input to the pixel shader. As you can see in the pixel shader in Figure 6, the PixelShaderInput structure is the same as the VertexShaderOutput structure in the vertex shader.

Figure 6 The SimpleTriangleEffectPixelShader.hlsl File

// Per-pixel data input to the pixel shader
struct PixelShaderInput
{
  float4 clipSpaceOutput : SV_POSITION;
  float4 sceneSpaceOutput : SCENE_POSITION;
  float3 color : COLOR0;
};
// Called for each pixel
float4 main(PixelShaderInput input) : SV_TARGET
{
  // Simply return color with opacity of 1
  return float4(input.color, 1);
}

However, the pixel shader is called for every pixel in the triangles, and all the fields of the structure have been interpolated over the surface of that triangle. The main function in the pixel shader must return a four-component color that includes opacity, so the interpolated RGB color is simply modified by appending an opacity field. That color is output to the display.

Here’s an interesting variation for the pixel shader: The z coordinates of the sceneSpaceOutput field range from 0 to 1, so it’s possible to visualize the depth of each triangle by using that coordinate to construct a gray shade, and return it from the main method:

float z = input.sceneSpaceOutput.z;
return float4(z, z, z, 1);

Enhancements?

The SimpleTriangleEffect cuts some corners. It should be made more versatile by including a method to set the vertex input. Some other features wouldn’t hurt either: The vertex shader is a great place to perform matrix transforms—such as rotation or camera transforms—because the matrix multiplications are executed on the GPU.

Few programmers are capable of resisting the temptation to implement code enhancements, particularly ones that turn a static image into an animated one.

Charles Petzold is a longtime contributor to MSDN Magazine and the author of “Programming Windows, 6th Edition” (Microsoft Press, 2013), a book about writing applications for Windows 8. His Web site is charlespetzold.com.

Thanks to the following Microsoft technical expert for reviewing this article: Doug Erickson
Doug Erickson is a lead programming writer for Microsoft’s OSG developer documentation team. When not writing and developing DirectX graphics code and content, he is reading articles like Charles’, because that’s how he likes to spend his free time. Well, that, and riding motorcycles.