Bits per scalar element in C++ AMP textures

Textures have historically been used in the graphics pipeline to add surface details to objects such as 3D models. These surface details can be stored using a rich variety of texture formats ranging from compact formats like RGBA with 8 bits for each component to more detailed formats which use 32 bits per component. The greater the number of bits in a format, the more the colors which can be rendered with the texture.

In order to support these large number of formats, the texture hardware can automatically access individual components based on the format when reading out or writing back each texture element. For example, if you are using an RGBA texture, each 8-bit component will be automatically extracted by the hardware into a 32-bit form when you read from the texture while the original data will continue to be stored in the compact form.

This capability to access 8-bit and 16-bit data is handy not only in the area of graphics but also in computation. In this blog post, we will learn how to use this capability using concurrency::graphics::texture in C++ AMP.

Component size in C++ AMP

In C++ AMP, type T in texture<T, N> specifies the type of the texture element. Individual texture elements are referred to as texels. Each texel can further have multiple components. For example: a texture<uint_4,2> uses texels of type uint_4 and each texel has 4 components of type uint.

Note we will use the terms component and scalar element interchangeably to refer to individual components of the texel in our posts on textures.

By default C++ AMP creates textures using 32 bits for each component. To create a texture which uses a different number of bits for each component, you need to specify the bits_per_scalar_element at the time of creation. Each different combination of type and bits_per_scalar_element causes C++ AMP to pick a different underlying texture format. The code snippet below shows how to create textures with 8 bit integers components. Notice that the type of the texture still says ‘int’; the type of the data is unchanged.

/* create texture of 8 bit ints */
const int size = 10;
extent<1> ex(size);
texture<unsigned int, 1> tex(ex, 8U /* bits_per_scalar_element */);

This capability can be used to access byte level data automatically inside a parallel_for_each.

You can also create this texture with initial data as follows:

/* Initial data to contain all ‘a’s */
vector<unsigned char> vec(size, 'a');

/* Create texture with initial data of 8 bit ints */
extent<1> ex(size);
texture<unsigned int, 1> tex(ex, vec.data(),
size * 1U /* data_length in bytes */,
8U /* bits_per_scalar_element */);

Let us look at another example to access character data…

Textures of ‘chars’

‘char’ data cannot be directly captured and accessed in your kernel. In a previous blog post, we offered utilities to help you workaround this restriction to access char arrays inside the parallel_for_each.

// Original problem

vector<char> data(size);
array_view<char> d_data(size, data.data()); // THIS WON’T COMPILE!!

parallel_for_each(extent<1>(size),
[=] (index<1> idx) restrict(amp)
{
// Read each character in the vector
... = d_data[idx];

});

The utilities we showed you helped you extract chars out integer data on the accelerator. Another way is to use textures to extract the chars automatically as shown in the sample below.

texture<int, 1> d_data(size, 8U);

/* code to initialize d_data is elided */

parallel_for_each(extent<1>(size), [&d_data] (index<1> idx) restrict(amp)
{
// Read each character in the vector.

// Each 8 bit char is automatically extracted into an int
int element = d_data[idx];

});

Accessing 32 bit RGBA data using texture

Let’s look at another example which uses a texture with multiple components. Suppose our data is stored using the 32 bit RBGA format where each component is represented by 8 bits. Without textures we would need to manually extract each byte ourselves as shown below.

/* contains raw image data. */
/* Each set of 4 bytes represent r,g,b,a values */
vector<unsigned int> image(image_height * image_width);

extent<2> image_extent(image_height, image_width);

/* texture of four 8-bit integers */
array_view< unsigned int, 2> image_av(image_extent, image);
parallel_for_each(image_extent,
[image_av](index<2> idx) restrict(amp)
{
/* Extract each component when reading from the buffer */
unsigned int color = image_av[idx];
unsigned int r = (color >> 24) & 0xFF;
unsigned int g = (color >> 16) & 0xFF;
unsigned int b = (color >> 8) & 0xFF;
unsigned int a = (color) & 0xFF;

/* use in computation */

});

Now let us implement the same code using textures. Notice how the individual bytes are automatically extracted and we do not need to shift out the bytes ourselves.

/* contains raw image data. */
/* Each set of 4 bytes represent r,g,b,a values */
vector<unsigned int> image(image_height * image_width);

extent<2> image_extent(image_height, image_width);

/* texture of four 8-bit ‘uints’ */
texture<uint_4, 2> image_texture(image_extent, image.data(),
/*data_length in bytes*/
image_extent.size() * 4U,
8U /* bits_per_scalar_element */);

parallel_for_each(image_extent,
[&image_texture](index<2> idx) restrict(amp)
{
/* 4 bytes are automatically extracted when reading */
uint_4 color = image_texture[idx];
unsigned int r = color.r;
unsigned int g = color.g;
unsigned int b = color.b;
unsigned int a = color.a;

/* use in computation */
});

Rules for each data type

Now that we have learnt how to use bits_per_scalar_element with integer data, lets learn the rules for the other data types. The table below summarizes the valid combinations of data type and bits_per_scalar_element.

Texture data type*	#bits per scalar_element
int, uint, int_2,uint_2 int_4,uint_4	8, 16, 32
float, float_2, float_4	16, 32
double, double_2	64
norm, unorm, norm_2,unorm_2 norm_4,unorm_4	8, 16

* Note that 3 component textures are not allowed in C++ AMP. Refer introduction on textures.

Let us look at each data type in detail.

1. int / uint

Textures of int and uint type are created with bits_per_scalar_element set to 32 by default. We can alternately create the texture to use 8 or 16 bits_per_scalar_element as shown in the examples above. 8-bit and 16-bit integers can represent the same range of values as char and short data types in traditional host side code. When read from a texture, they are automatically extracted out into a 32 bit integer on the accelerator.

2. float

Textures of float type are also created with bits_per_scalar_element set to 32 by default. This can be changed to 16-bit by specifying the bits_per_scalar_element during construction. A 16 bit float is a half precision floating point number format. It is used to store floating point data when the higher precision is not needed. IEEE-754-2008 defines this as the binary16 format. We will look at 16 bit floats in detail in a future post.

3. double

Textures of doubles use a bits_per_scalar_element set to 64. You cannot construct a texture of doubles with any other value for bits_per_scalar_element.

4. norm/unorm

In a previous post, we introduced the norm and unorm types. When used outside a texture, the usage for these types is the same as a regular float with a value from [-1.0f, 1.0f] and [0.0f, 1.0f].

When stored in a texture, they represent a fixed-point number with a value from [-1.0f, 1.0f] and [0.0f, 1.0f]. All bits set to 0 maps to 0.0f, and all bits set to 1 map to 1.0f. Between the 0.0f and 1.0f, a sequence of evenly spaced floating-point values is represented using the number of unique values allowed by the number of bits. Textures of norm and unorm types can be created with bits_per_scalar_element set to 8 or 16 ; which mustalways beexplicitly specified during construction.

Let us try to understand these types with a simpler example of a 2-bit unorm (which is not actually supported). 2-bits allow for 4 unique values: 00, 01, 10 and 11. 00 maps to 0.0f and 11 maps to 1.0f. The other 2 values are set to values equidistant from 0 and 1 as shown below.

00 0/3

01 1/3

10 2/3

11 3/3

What happens when you break the rules...

The only valid values for bits_per_scalar_element are 8, 16, 32 and 64. Using any other value will result in the runtime_exception shown below:

runtime_exception (80070057): Invalid _Bits_per_scalar_element argument - it can only be 8, 16, 32, or 64.

Trying to set bits_per_scalar_element as 64 for a texture which is not of double based type will result in the runtime_exception shown below:

runtime_exception (80070057): Invalid _Bits_per_scalar_element argument - 64 is only valid for texture of double based short vector types.

As shown in the table, not all values for bits_per_scalar_element are supported for all data types. Trying to create a texture with an invalid combination of data type and bits_per_scalar_element will result in the unsupported_feature exception shown below:

unsupported_feature (80004005): The combination of the short vector type and bits-per-scalar_element for the texture is not supported.

Summary

Apart from the obvious advantage of enabling code to access partial word data, using a smaller number of bytes for each component also helps in reducing storage requirements on the accelerator. If your computation requires data of lower precision, you can store it using 8 bits or 16 bits instead of wasting an entire 32 bits for each component.

This concludes are introduction to bits_per_scalar_element in textures. In a future blog post, we will discuss how to write to textures with a different bits_per_scalar_element. As always, please feel free to share your thoughts and ask questions below or in our MSDN concurrency forum.

Comments

Anonymous
April 17, 2012
Thanks for this nice explanation of texture in AMP. But I still don't understand something: Would using texture instead of others ways to extract 8 or 16 bits pixels data give a performance advantage (at least on most GPU)? For the moment, I use something like blogs.msdn.com/.../c-amp-it-s-got-character-but-no-char.aspx but for unsigned short (16 bits image format). Therefore, even if the Texture way make the code cleaner, I would try to switch to texture only if I get a performance advantage.
Anonymous
April 18, 2012
PYB_42, There are other reasons to choose between texture and array_view. Since you specifically ask about performance, here is how you can expect accessing sub words to behave on most hardware. The performance of reading sub words from an array_view using the read_uchar (blogs.msdn.com/.../c-amp-it-s-got-character-but-no-char.aspx) or alike is comparable to the performance of reading from a texture. However, writes can be much faster using textures since you can avoid the atomic operations that serialize writes to the same word using write_uchar or alike. Writes to sub words in textures are guaranteed to be thread safe. The one limitation of using textures is that 1D textures can contain a maximum of 16K elements. If your data is large and the data abstraction has to be one dimensional, it may not fit.
Anonymous
April 18, 2012
Thanks for the details. I will try to switch to texture, at least for the output image data.

Last updated on 2012-04-17

共用方式為