Getting Started with the Stream-Output Stage

This section describes how to use a geometry shader with the stream output stage.

Compile a Geometry Shader

This geometry shader (GS) calculates a face normal for each triangle, and outputs position, normal and texture coordinate data.

    float4 Pos : SV_POSITION;
    float3 Norm : TEXCOORD0;
    float2 Tex : TEXCOORD1;

void GS( triangle GSPS_INPUT input[3], inout TriangleStream<GSPS_INPUT> TriStream )
    GSPS_INPUT output;
    // Calculate the face normal
    float3 faceEdgeA = input[1].Pos - input[0].Pos;
    float3 faceEdgeB = input[2].Pos - input[0].Pos;
    float3 faceNormal = normalize( cross(faceEdgeA, faceEdgeB) );
    float3 ExplodeAmt = faceNormal*Explode;
    // Calculate the face center
    float3 centerPos = (input[0] + input[1] + input[2];
    float2 centerTex = (input[0].Tex + input[1].Tex + input[2].Tex)/3.0;
    centerPos += faceNormal*Explode;
    // Output the pyramid
    for( int i=0; i<3; i++ )
        output.Pos = input[i].Pos + float4(ExplodeAmt,0);
        output.Pos = mul( output.Pos, View );
        output.Pos = mul( output.Pos, Projection );
        output.Norm = input[i].Norm;
        output.Tex = input[i].Tex;
        TriStream.Append( output );
        int iNext = (i+1)%3;
        output.Pos = input[iNext].Pos + float4(ExplodeAmt,0);
        output.Pos = mul( output.Pos, View );
        output.Pos = mul( output.Pos, Projection );
        output.Norm = input[iNext].Norm;
        output.Tex = input[iNext].Tex;
        TriStream.Append( output );
        output.Pos = float4(centerPos,1) + float4(ExplodeAmt,0);
        output.Pos = mul( output.Pos, View );
        output.Pos = mul( output.Pos, Projection );
        output.Norm = faceNormal;
        output.Tex = centerTex;
        TriStream.Append( output );
    for( int i=2; i>=0; i-- )
        output.Pos = input[i].Pos + float4(ExplodeAmt,0);
        output.Pos = mul( output.Pos, View );
        output.Pos = mul( output.Pos, Projection );
        output.Norm = -input[i].Norm;
        output.Tex = input[i].Tex;
        TriStream.Append( output );

Keeping that code in mind, consider that a geometry shader looks much like a vertex or pixel shader, but with the following exceptions: the type returned by the function, the input parameter declarations, and the intrinsic function.

Item Description
Function return type
The function return type does one thing, declares the maximum number of vertices that can be output by the shader. In this case,

defines the output to be a maximum of 12 vertices.

Input parameter declarations

This function takes two input parameters:

triangle GSPS_INPUT input[3] , inout TriangleStream<GSPS_INPUT> TriStream

The first parameter is an array of vertices (3 in this case) defined by a GSPS_INPUT structure (which defines per-vertex data as a position, a normal and a texture coordinate). The first parameter also uses the triangle keyword, which means the input assembler stage must output data to the geometry shader as one of the triangle primitive types (triangle list or triangle strip).

The second parameter is a triangle stream defined by the type TriangleStream<GSPS_INPUT>. This means the parameter is an array of triangles, each of which is made up of three vertices (that contain the data from the members of GSPS_INPUT).

Use the triangle and trianglestream keywords to identify individual triangles or a stream of triangles in a GS.

Intrinsic function

The lines of code in the shader function use common-shader-core HLSL intrinsic functions except the last two lines, which call Append and RestartStrip. These functions are only available to a geometry shader. Append informs the geometry shader to append the output to the current strip; RestartStrip creates a new primitive strip. A new strip is implicitly created in every invocation of the GS stage.

The rest of the shader looks very similar to a vertex or pixel shader. The geometry shader uses a structure to declare input parameters and marks the position member with the SV_POSITION semantic to tell the hardware that this is positional data. The input structure identifies the other two input parameters as texture coordinates (even though one of them will contain a face normal). You could use your own custom semantic for the face normal if you prefer.

Having designed the geometry shader, call D3DCompile to compile as shown in the following code example.

ID3DBlob** ppShader;

D3DCompile( pSrcData, sizeof( pSrcData ), 
  "Tutorial13.fx", NULL, NULL, "GS", "gs_4_0", 
  dwShaderFlags, 0, &ppShader, NULL );

Just like vertex and pixel shaders, you need a shader flag to tell the compiler how you want the shader compiled (for debugging, optimized for speed, and so on), the entry point function, and the shader model to validate against. This example creates a geometry shader built from the Tutorial13.fx file, by using the GS function. The shader is compiled for shader model 4.0.

Create a Geometry-Shader Object with Stream Output

Once you know that you will be streaming the data from the geometry, and you have successfully compiled the shader, the next step is to call ID3D11Device::CreateGeometryShaderWithStreamOutput to create the geometry shader object.

But first, you need to declare the stream output (SO) stage input signature. This signature matches or validates the GS outputs and the SO inputs at the time of object creation. The following code is an example of the SO declaration.

    // semantic name, semantic index, start component, component count, output slot
    { "SV_POSITION", 0, 0, 4, 0 },   // output all components of position
    { "TEXCOORD0", 0, 0, 3, 0 },     // output the first 3 of the normal
    { "TEXCOORD1", 0, 0, 2, 0 },     // output the first 2 texture coordinates

D3D11Device->CreateGeometryShaderWithStreamOut( pShaderBytecode, ShaderBytecodesize, pDecl, 
    sizeof(pDecl), NULL, 0, 0, NULL, &pStreamOutGS );

This function takes several parameters including:

  • A pointer to the compiled geometry shader (or vertex shader if no geometry shader will be present and data will be streamed out directly from the vertex shader). For information about how to get this pointer, see Getting a Pointer to a Compiled Shader.
  • A pointer to an array of declarations that describe the input data for the stream output stage. (See D3D11_SO_DECLARATION_ENTRY.) You can supply up to 64 declarations, one for each different type of element to be output from the SO stage. The array of declaration entries describes the data layout regardless of whether only a single buffer or multiple buffers are to be bound for stream output.
  • The number of elements that are written out by the SO stage.
  • A pointer to the geometry shader object that is created (see ID3D11GeometryShader).

In this situation, the buffer stride is NULL, the index of the stream to be sent to the rasterizer is 0, and the class linkage interface is NULL.

The stream output declaration defines the way that data is written to a buffer resource. You can add as many components as you want to the output declaration. Use the SO stage to write to a single buffer resource or many buffer resources. For a single buffer, the SO stage can write many different elements per-vertex. For multiple buffers, the SO stage can only write a single element of per-vertex data to each buffer.

To use the SO stage without using a geometry shader, call ID3D11Device::CreateGeometryShaderWithStreamOutput and pass a pointer to a vertex shader to the pShaderBytecode parameter.

Set the Output Targets

The last step is to set the SO stage buffers. Data can be streamed into one or more buffers in memory for use later. The following code shows how to create a single buffer that can be used for vertex data, as well as for the SO stage to stream data into.

ID3D11Buffer *m_pBuffer;
int m_nBufferSize = 1000000;

D3D11_BUFFER_DESC bufferDesc =
D3D11Device->CreateBuffer( &bufferDesc, NULL, &m_pBuffer );

Create a buffer by calling ID3D11Device::CreateBuffer. This example illustrates default usage, which is typical for a buffer resource that is expected to be updated fairly frequently by the CPU. The binding flag identifies the pipeline stage that the resource can be bound to. Any resource used by the SO stage must also be created with the bind flag D3D10_BIND_STREAM_OUTPUT.

Once the buffer is successfully created, set it to the current device by calling ID3D11DeviceContext::SOSetTargets:

UINT offset[1] = 0;
D3D11Device->SOSetTargets( 1, &m_pBuffer, offset );

This call takes the number of buffers, a pointer to the buffers, and an array of offsets (one offset into each of the buffers that indicates where to begin writing data). Data will be written to these streaming-output buffers when a draw function is called. An internal variable keeps track of the position for where to begin writing data to the streaming-output buffers, and that variables will continue to increment until SOSetTargets is called again and a new offset value is specified.

All data written out to the target buffers will be 32-bit values.

Stream-Output Stage