C++ AMP contains several APIs in the Concurrency::graphics namespace that you can use to access the texture support on GPUs. Some common scenarios are:
You can use the texture class as a data container for computation and exploit the spatial locality of the texture cache and layouts of GPU hardware. Spatial locality is the property of data elements being physically close to each other.
The runtime provides efficient interoperability with non-compute shaders. Pixel, vertex, tessellation, and hull shaders frequently consume or produce textures that you can use in your C++ AMP computations.
The graphics APIs in C++ AMP provide alternative ways to access sub-word packed buffers. Textures that have formats that represent texels (texture elements) that are composed of 8-bit or 16-bit scalars allow access to such packed data storage.
Notiz
C++ AMP headers are deprecated starting with Visual Studio 2022 version 17.0.
Including any AMP headers will generate build errors. Define _SILENCE_AMP_DEPRECATION_WARNINGS before including any AMP headers to silence the warnings.
The norm and unorm Types
The norm and unorm types are scalar types that limit the range of float values; this is known as clamping. These types can be explicitly constructed from other scalar types. In casting, the value is first cast to float and then clamped to the respective region that's allowed by norm [-1.0, 1.0] or unorm [0.0, 1.0]. Casting from +/- infinity returns +/-1. Casting from NaN is undefined. A norm can be implicitly constructed from a unorm and there is no loss of data. The implicit conversion operator to float is defined on these types. Binary operators are defined between these types and other built-in scalar types such as float and int: +, -, *, /, ==, !=, >, <, >=, <=. The compound assignment operators are also supported: +=, -=, *=, /=. The unary negation operator (-) is defined for norm types.
Short Vector Library
The Short Vector Library provides some of the functionality of the Vector Type that's defined in HLSL and is typically used to define texels. A short vector is a data structure that holds one to four values of the same type. The supported types are double, float, int, norm, uint, and unorm. The type names are shown in the following table. For each type, there is also a corresponding typedef that doesn't have an underscore in the name. The types that have the underscores are in the Concurrency::graphics Namespace. The types that don't have the underscores are in the Concurrency::graphics::direct3d Namespace so that they are clearly separated from the similarly-named fundamental types such as __int8 and __int16.
Type
Length 2
Length 3
Length 4
double
double_2
double2
double_3
double3
double_4
double4
float
float_2
float2
float_3
float3
float_4
float4
int
int_2
int2
int_3
int3
int_4
int4
norm
norm_2
norm2
norm_3
norm3
norm_4
norm4
uint
uint_2
uint2
uint_3
uint3
uint_4
uint4
unorm
unorm_2
unorm2
unorm_3
unorm3
unorm_4
unorm4
Operators
If an operator is defined between two short vectors, then it is also defined between a short vector and a scalar. Also, one of these must be true:
The scalar's type must be the same as the short vector's element type.
The scalar's type can be implicitly converted to the vector's element type by using only one user-defined conversion.
The operation is carried component-wise between each component of the short vector and the scalar. Here are the valid operators:
Operator type
Valid types
Binary operators
Valid on all types: +, -, *, /,
Valid on integer types: %, ^, |, &, <<, >>
The two vectors must have the same size, and the result is a vector of the same size.
Relational operators
Valid on all types: == and !=
Compound assignment operator
Valid on all types: +=, -=, *=, /=
Valid on integer types: %=, ^=, |=, &=, <<=, >>=
Increment and decrement operators
Valid on all types: ++, --
Both prefix and postfix are valid.
Bitwise NOT operator (~)
Valid on integer types.
Unary - operator
Valid on all types except unorm and uint.
Swizzling Expressions
The Short Vector Library supports the vector_type.identifier accessor construct to access the components of a short vector. The identifier, which is known as a swizzling expression, specifies the components of the vector. The expression can be an l-value or an r-value. Individual characters in the identifier may be: x, y, z, and w; or r, g, b, and a. "x" and "r" mean the zero-th component, "y" and "g" mean the first component, and so on. (Notice that "x" and "r" cannot be used in the same identifier.) Therefore, "rgba" and "xyzw" return the same result. Single-component accessors such as "x" and "y" are scalar value types. Multi-component accessors are short vector types. For example, if you construct an int_4 vector that's named fourInts and has the values 2, 4, 6, and 8, then fourInts.y returns the integer 4 and fourInts.rg returns an int_2 object that has the values 2 and 4.
Texture Classes
Many GPUs have hardware and caches that are optimized to fetch pixels and texels and to render images and textures. The texture<T,N> class, which is a container class for texel objects, exposes the texture functionality of these GPUs. A texel can be:
An int, uint, float, double, norm, or unorm scalar.
A short vector that has two or four components. The only exception is double_4, which is not allowed.
The texture object can have a rank of 1, 2, or 3. The texture object can be captured only by reference in the lambda of a call to parallel_for_each. The texture is stored on the GPU as Direct3D texture objects. For more information about textures and texels in Direct3D, see Introduction to Textures in Direct3D 11.
The texel type you use might be one of the many texture formats that are used in graphics programming. For example, an RGBA format could use 32 bits, with 8 bits each for the R, G, B, and A scalar elements. The texture hardware of a graphics card can access the individual elements based on the format. For example, if you are using the RGBA format, the texture hardware can extract each 8-bit element into a 32-bit form. In C++ AMP, you can set the bits per scalar element of your texel so that you can automatically access the individual scalar elements in the code without using bit-shifting.
Instantiating Texture Objects
You can declare a texture object without initialization. The following code example declares several texture objects.
C++
#include<amp.h>#include<amp_graphics.h>usingnamespace concurrency;
usingnamespace concurrency::graphics;
voiddeclareTextures(){
// Create a 16-texel texture of int.
texture<int, 1> intTexture1(16);
texture<int, 1> intTexture2(extent<1>(16));
// Create a 16 x 32 texture of float_2.
texture<float_2, 2> floatTexture1(16, 32);
texture<float_2, 2> floatTexture2(extent<2>(16, 32));
// Create a 2 x 4 x 8 texture of uint_4.
texture<uint_4, 3> uintTexture1(2, 4, 8);
texture<uint_4, 3> uintTexture2(extent<3>(2, 4, 8));
}
You can also use a constructor to declare and initialize a texture object. The following code example instantiates a texture object from a vector of float_4 objects. The bits per scalar element is set to the default. You cannot use this constructor with norm, unorm, or the short vectors of norm and unorm, because they do not have a default bits per scalar element.
C++
#include<amp.h>#include<amp_graphics.h>#include<vector>usingnamespace concurrency;
usingnamespace concurrency::graphics;
voidinitializeTexture(){
std::vector<int_4> texels;
for (int i = 0; i < 768 * 1024; i++) {
int_4 i4(i, i, i, i);
texels.push_back(i4);
}
texture<int_4, 2> aTexture(768, 1024, texels.begin(), texels.end());
}
You can also declare and initialize a texture object by using a constructor overload that takes a pointer to the source data, the size of source data in bytes, and the bits per scalar element.
C++
voidcreateTextureWithBPC(){ // Create the source data.float source[1024* 2];
for (int i = 0; i <1024* 2; i++) {
source[i] = (float)i;
}
// Initialize the texture by using the size of source in bytes // and bits per scalar element.
texture<float_2, 1> floatTexture(1024, source, (unsignedint)sizeof(source), 32U);
}
The textures in these examples are created on the default view of the default accelerator. You can use other overloads of the constructor if you want to specify an accelerator_view object. You cannot create a texture object on a CPU accelerator.
There are limits on the size of each dimension of the texture object, as shown in the following table. A run-time error is generated if you exceed the limits.
voidreadTexture(){
std::vector<int_2> src;
for (int i = 0; i <16 *32; i++) {
int_2 i2(i, i);
src.push_back(i2);
}
std::vector<int_2> dst(16* 32);
array_view<int_2, 2> arr(16, 32, dst);
arr.discard_data();
const texture<int_2, 2> tex9(16, 32, src.begin(), src.end());
parallel_for_each(tex9.extent, [=, &tex9] (index<2> idx) restrict(amp) { // Use the subscript operator.
arr[idx].x += tex9[idx].x; // Use the function () operator.
arr[idx].x += tex9(idx).x; // Use the get method.
arr[idx].y += tex9.get(idx).y; // Use the function () operator.
arr[idx].y += tex9(idx[0], idx[1]).y;
});
arr.synchronize();
}
The following code example demonstrates how to store texture channels in a short vector, and then access the individual scalar elements as properties of the short vector.
C++
voidUseBitsPerScalarElement(){ // Create the image data. // Each unsigned int (32-bit) represents four 8-bit scalar elements(r,g,b,a values).constint image_height = 16;
constint image_width = 16;
std::vector<unsignedint> image(image_height* image_width);
extent<2> image_extent(image_height, image_width);
// By using uint_4 and 8 bits per channel, each 8-bit channel in the data source is // stored in one 32-bit component of a uint_4.
texture<uint_4, 2> image_texture(image_extent, image.data(), image_extent.size()* 4U, 8U);
// Use can access the RGBA values of the source data by using swizzling expressions of the uint_4.
parallel_for_each(image_extent,
[&image_texture](index<2> idx) restrict(amp)
{ // 4 bytes are automatically extracted when reading.
uint_4 color = image_texture[idx];
unsignedint r = color.r;
unsignedint g = color.g;
unsignedint b = color.b;
unsignedint a = color.a;
});
}
The following table lists the valid bits per channel for each sort vector type.
Texture data type
Valid bits per scalar element
int, int_2, int_4
uint, uint_2, uint_4
8, 16, 32
int_3, uint_3
32
float, float_2, float_4
16, 32
float_3
32
double, double_2
64
norm, norm_2, norm_4
unorm, unorm_2, unorm, 4
8, 16
Writing to Texture Objects
Use the texture::set method to write to texture objects. A texture object can be readonly or read/write. For a texture object to be readable and writeable, the following conditions must be true:
T has only one scalar component. (Short vectors are not allowed.)
T is not double, norm, or unorm.
The texture::bits_per_scalar_element property is 32.
If all three are not true, then the texture object is readonly. The first two conditions are checked during compilation. A compilation error is generated if you have code that tries to write to a readonly texture object. The condition for texture::bits_per_scalar_element is detected at run time, and the runtime generates the unsupported_feature exception if you try to write to a readonly texture object.
The following code example writes values to a texture object.
You can copy between texture objects by using the copy function or the copy_async function, as shown in the following code example.
C++
voidcopyHostArrayToTexture(){ // Copy from source array to texture object by using the copy function.float floatSource[1024* 2];
for (int i = 0; i <1024* 2; i++) {
floatSource[i] = (float)i;
}
texture<float_2, 1> floatTexture(1024);
copy(floatSource, (unsignedint)sizeof(floatSource), floatTexture);
// Copy from source array to texture object by using the copy function.char charSource[16* 16];
for (int i = 0; i <16* 16; i++) {
charSource[i] = (char)i;
}
texture<int, 2> charTexture(16, 16, 8U);
copy(charSource, (unsignedint)sizeof(charSource), charTexture);
// Copy from texture object to source array by using the copy function.
copy(charTexture, charSource, (unsignedint)sizeof(charSource));
}
You can also copy from one texture to another by using the texture::copy_to method. The two textures can be on different accelerator_views. When you copy to a writeonly_texture_view object, the data is copied to the underlying texture object. The bits per scalar element and the extent must be the same on the source and destination texture objects. If those requirements are not met, the runtime throws an exception.
Texture View Classes
C++ AMP introduces the texture_view Class in Visual Studio 2013. Texture views support the same texel types and ranks as the texture Class, but unlike textures, they provide access to additional hardware features such as texture sampling and mipmaps. Texture views support read-only, write-only, and read-write access to the underlying texture data.
Read-only access is provided by the texture_view<const T, N> template specialization, which supports elements that have 1, 2, or 4 components, texture sampling, and dynamic access to a range of mipmap levels that are determined when the view is instantiated.
Write-only access is provided by the non-specialized template class texture_view<T, N>, which supports elements that have either 2 or 4 components and can access one mipmap level that's determined when the view is instantiated. It does not support sampling.
Read-write access is provided by the non-specialized template class texture_view<T, N>, which, like textures, supports elements that have only one component; the view can access one mipmap level that's determined when it is instantiated. It does not support sampling.
Texture views are analogous to array views, but do not provide the automatic data management and movement functionality that the array_view Class provides over the array class. A texture_view can only be accessed on the accelerator view where the underlying texture data resides.
writeonly_texture_view Deprecated
For Visual Studio 2013, C++ AMP introduces better support for hardware texture features such as sampling and mipmaps, which could not be supported by the writeonly_texture_view Class. The newly introduced texture_view class supports a superset of the functionality in writeonly_texture_view; as a result, writeonly_texture_view is deprecated.
We recommend—at least for new code—that you use texture_view to access functionality that was formerly provided by writeonly_texture_view. Compare the following two code examples that write to a texture object that has two components (int_2). Notice that in both cases, the view, wo_tv4, must be captured by value in the lambda expression. Here is the example that uses the new texture_view class:
As you can see, the two code examples are nearly identical when all you are doing is writing to the primary mipmap level. If you used writeonly_texture_view in existing code and you're not planning to enhance that code, you don't have to change it. However, if you're thinking about bringing that code forward, we suggest that you rewrite it to use texture_view because the enhancements in it support new hardware texture features. Read on for more information about these new capabilities.
Declaring a texture_view is similar to declaring an array_view that's associated with an array. The following code example declares several texture objects and texture_view objects that are associated with them.
C++
#include<amp.h>#include<amp_graphics.h>usingnamespace concurrency;
usingnamespace concurrency::graphics;
voiddeclareTextureViews(){
// Create a 16-texel texture of int, with associated texture_views.
texture<int, 1> intTexture(16);
texture_view<constint, 1> intTextureViewRO(intTexture); // read-only
texture_view<int, 1> intTextureViewRW(intTexture); // read-write// Create a 16 x 32 texture of float_2, with associated texture_views.
texture<float_2, 2> floatTexture(16, 32);
texture_view<const float_2, 2> floatTextureViewRO(floatTexture); // read-only
texture_view<float_2, 2> floatTextureViewRO(floatTexture); // write-only// Create a 2 x 4 x 8 texture of uint_4, with associated texture_views.
texture<uint_4, 3> uintTexture(2, 4, 8);
texture_view<const uint_4, 3> uintTextureViewRO(uintTexture); // read-only
texture_view<uint_4, 3> uintTextureViewWO(uintTexture); // write-only
}
Notice how a texture view whose element type is non-const and has one component is read-write, but a texture view whose element type is non-const but has more than one componenent are write-only. Texture views of const element types are always read-only, but if the element type is non-const, then the number of components in the element determines whether it is read-write (1 component) or write-only (multiple components).
The element type of a texture_view—its const-ness and also the number of components it has—also plays a role in determining whether the view supports texture sampling, and how mipmap levels can be accessed:
Type
Components
Read
Write
Sampling
Mipmap access
texture_view<const T, N>
1, 2, 4
Yes
No (1)
Yes
Yes, indexable. Range is determined at instantiation.
Texture_view<T, N>
1
2, 4
Yes
No (2)
Yes
Yes
No (1)
No (1)
Yes, one level. Level is determined at instantiation.
Yes, one level. Level is determined at instantiation.
From this table, you can see that read-only texture views fully support the new capabilities in exchange for not being able to write to the view. Writable texture views are limited in that they can only access one mipmap level. Read-write texture views are even more specialized than writable ones, because they add the requirement that the element type of the texture view has only one component. Notice that sampling is not supported for writable texture views because it's a read-oriented operation.
Reading from Texture View Objects
Reading unsampled texture data through a texture view is just like reading it from the texture itself, except that textures are captured by reference, whereas texture views are captured by value. The following two code examples demonstrate; first, by using texture only:
Texture views whose elements are based on floating-point types—for example, float, float_2, or float_4—can also be read by using texture sampling to take advantage of hardware support for various filtering modes and addressing modes. C++ AMP supports the two filtering modes that are most common in compute scenarios—point-filtering (nearest-neighbor) and linear-filtering (weighted average)—and four addressing modes—wrapped, mirrored, clamped, and border. For more information about addressing modes, see address_mode Enumeration.
In addition to modes that C++ AMP supports directly, you can access other filtering modes and addressing modes of the underlying platform by using the interop APIs to adopt a texture sampler that was created by using the platform APIs directly. For example, Direct3D supports other filtering modes such as anisotropic filtering, and can apply a different addressing mode to each dimension of a texture. You could create a texture sampler whose coordinates are wrapped vertically, mirrored horizontally, and sampled with anisotropic filtering by using the Direct3D APIs, and then leverage the sampler in your C++ AMP code by using the make_sampler interop API. For more information see Texture Sampling in C++ AMP on the Parallel Programming in Native Code blog.
Texture views also support the reading of mipmaps. Read-only texture views (those that have a const element type) offer the most flexibility because a range of mip-levels that is determined at instantiation can be dynamically sampled, and because elements that have 1, 2, or 4 components are supported. Read-write texture views that have elements that have one component also support mipmaps, but only of a level that's determined at instantiation. For more information, see Texture with Mipmaps on the Parallel Programming in Native Code blog.
Writing to Texture View Objects
Use the texture_view::get Method to write to the underlying texture through the texture_view object. A texture view can be read-only, read-write, or write-only. For a texture view to be writable it must have an element type that is non-const; for a texture view to be readable and writable, its element type must also have only one component. Otherwise, the texture view is read-only. You can only access one mipmap level of a texture at a time through a texture view, and the level is specified when the view is instantiated.
This example shows how to write to the second-most detailed mipmap level of a texture that has 4 mipmap levels. The most detailed mipmap level is level 0.
C++
// Create a texture that has 4 mipmap levels : 16x16, 8x8, 4x4, 2x2
texture<int, 2> tex(extent<2>(16, 16), 16U, 4);
// Create a writable texture view to the second mipmap level :4x4
texture_view<int, 2> w_view(tex, 1);
parallel_for_each(w_view.extent, [=](index<2> idx) restrict(amp)
{
w_view.set(idx, 123);
});
Interoperability
The C++ AMP runtime supports interoperability between texture<T,1> and the ID3D11Texture1D interface, between texture<T,2> and the ID3D11Texture2D interface, and between texture<T,3> and the ID3D11Texture3D interface. The get_texture method takes a texture object and returns an IUnknown interface. The make_texture method takes an IUnknown interface and an accelerator_view object and returns a texture object.
Azure HPC is a purpose-built cloud capability for HPC & AI workload, using leading-edge processors and HPC-class InfiniBand interconnect, to deliver the best application performance, scalability, and value. Azure HPC enables users to unlock innovation, productivity, and business agility, through a highly available range of HPC & AI technologies that can be dynamically allocated as your business and technical needs change. This learning path is a series of modules that help you get started on Azure HPC - you