Use SIMD-accelerated numeric types

SIMD (Single instruction, multiple data) provides hardware support for performing an operation on multiple pieces of data, in parallel, using a single instruction. In .NET, there's set of SIMD-accelerated types under the System.Numerics namespace. SIMD operations can be parallelized at the hardware level. That increases the throughput of the vectorized computations, which are common in mathematical, scientific, and graphics apps.

.NET SIMD-accelerated types

The .NET SIMD-accelerated types include the following types:

  • The Vector2, Vector3, and Vector4 types, which represent vectors with 2, 3, and 4 Single values.

  • Two matrix types, Matrix3x2, which represents a 3x2 matrix, and Matrix4x4, which represents a 4x4 matrix of Single values.

  • The Plane type, which represents a plane in three-dimensional space using Single values.

  • The Quaternion type, which represents a vector that is used to encode three-dimensional physical rotations using Single values.

  • The Vector<T> type, which represents a vector of a specified numeric type and provides a broad set of operators that benefit from SIMD support. The count of a Vector<T> instance is fixed for the lifetime of an application, but its value Vector<T>.Count depends on the CPU of the machine running the code.

    Note

    The Vector<T> type is not included in the .NET Framework. You must install the System.Numerics.Vectors NuGet package to get access to this type.

The SIMD-accelerated types are implemented in such a way that they can be used with non-SIMD-accelerated hardware or JIT compilers. To take advantage of SIMD instructions, your 64-bit apps must be run by the runtime that uses the RyuJIT compiler. A RyuJIT compiler is included in .NET Core and in .NET Framework 4.6 and later. SIMD support is only provided when targeting 64-bit processors.

How to use SIMD?

Before executing custom SIMD algorithms, it's possible to check if the host machine supports SIMD by using Vector.IsHardwareAccelerated, which returns a Boolean. This doesn't guarantee that SIMD-acceleration is enabled for a specific type, but is an indicator that it's supported by some types.

Simple Vectors

The most primitive SIMD-accelerated types in .NET are Vector2, Vector3, and Vector4 types, which represent vectors with 2, 3, and 4 Single values. The example below uses Vector2 to add two vectors.

var v1 = new Vector2(0.1f, 0.2f);
var v2 = new Vector2(1.1f, 2.2f);
var vResult = v1 + v2;

It's also possible to use .NET vectors to calculate other mathematical properties of vectors such as Dot product, Transform, Clamp and so on.

var v1 = new Vector2(0.1f, 0.2f);
var v2 = new Vector2(1.1f, 2.2f);
var vResult1 = Vector2.Dot(v1, v2);
var vResult2 = Vector2.Distance(v1, v2);
var vResult3 = Vector2.Clamp(v1, Vector2.Zero, Vector2.One);

Matrix

Matrix3x2, which represents a 3x2 matrix, and Matrix4x4, which represents a 4x4 matrix. Can be used for matrix-related calculations. The example below demonstrates multiplication of a matrix to its correspondent transpose matrix using SIMD.

var m1 = new Matrix4x4(
            1.1f, 1.2f, 1.3f, 1.4f,
            2.1f, 2.2f, 3.3f, 4.4f,
            3.1f, 3.2f, 3.3f, 3.4f,
            4.1f, 4.2f, 4.3f, 4.4f);

var m2 = Matrix4x4.Transpose(m1);
var mResult = Matrix4x4.Multiply(m1, m2);

Vector<T>

The Vector<T> gives the ability to use longer vectors. The count of a Vector<T> instance is fixed, but its value Vector<T>.Count depends on the CPU of the machine running the code.

The following example demonstrates how to calculate the element-wise sum of two arrays using Vector<T>.

double[] Sum(double[] left, double[] right)
{
    if (left is null)
    {
        throw new ArgumentNullException(nameof(left));
    }

    if (right is null)
    {
        throw new ArgumentNullException(nameof(right));
    }

    if (left.Length != right.Length)
    {
        throw new ArgumentException($"{nameof(left)} and {nameof(right)} are not the same length");
    }

    int length = left.Length;
    double[] result = new double[length];

    // Get the number of elements that can't be processed in the vector
    // NOTE: Vector<T>.Count is a JIT time constant and will get optimized accordingly
    int remaining = length % Vector<double>.Count;

    for (int i = 0; i < length - remaining; i += Vector<double>.Count)
    {
        var v1 = new Vector<double>(left, i);
        var v2 = new Vector<double>(right, i);
        (v1 + v2).CopyTo(result, i);
    }

    for (int i = length - remaining; i < length; i++)
    {
        result[i] = left[i] + right[i];
    }

    return result;
}

Remarks

SIMD is more likely to remove one bottleneck and expose the next, for example memory throughput. In general the performance benefit of using SIMD varies depending on the specific scenario, and in some cases it can even perform worse than simpler non-SIMD equivalent code.