Using fused operators to improve performance

Some DirectML operators support a concept known as fusion. Operator fusion is a way to improve performance by merging one operator (typically, an activation function) into a different operator so that they are executed together without requiring a roundtrip to memory.

When to fuse activations

Fused activations are a performance optimization. An extremely common scenario in many machine learning (ML) models is to apply a nonlinearity (an activation function) to the output of each layer in the model.

Ordinarily, this requires a roundtrip to graphics memory. For example if a Convolution is followed by a non-fused Relu activation, then the GPU must wait for the results of the Convolution to be written into GPU memory before it can begin computing the Relu activation layer. As the compute workload of most activation functions tends to be small, this roundtrip to graphics memory can be a major performance bottleneck.

Operator fusion allows the activation function (Relu in the above example) to be performed as part of the preceding operator (Convolution, for example). This allows the GPU to compute the activation function without waiting for the results of the preceding operator to be written into memory—and that improves performance.

Because fused activations produce the same result, but are faster in many cases, we recommend that you eliminate activation layers by fusing them into their preceding operator wherever possible.

How to fuse activations

Operators that support fused activations have an additional optional parameter in their operator struct, const DML_OPERATOR_DESC* FusedActivation. Convolution, for example, supports fused activation, and it has a corresponding FusedActivation in its operator description (see DML_CONVOLUTION_OPERATOR_DESC).

struct DML_CONVOLUTION_OPERATOR_DESC
{
    const DML_TENSOR_DESC* InputTensor;
    const DML_TENSOR_DESC* FilterTensor;
    _Maybenull_ const DML_TENSOR_DESC* BiasTensor;
    const DML_TENSOR_DESC* OutputTensor;
    DML_CONVOLUTION_MODE Mode;
    DML_CONVOLUTION_DIRECTION Direction;
    UINT DimensionCount;
    _Field_size_(DimensionCount) const UINT* Strides;
    _Field_size_(DimensionCount) const UINT* Dilations;
    _Field_size_(DimensionCount) const UINT* StartPadding;
    _Field_size_(DimensionCount) const UINT* EndPadding;
    _Field_size_(DimensionCount) const UINT* OutputPadding;
    UINT GroupCount;
    _Maybenull_ const DML_OPERATOR_DESC* FusedActivation;
};

To fuse an activation, construct a DML_OPERATOR_DESC that describes the type of activation to be fused. For example to fuse a Relu function, the correct operator type would be DML_OPERATOR_ACTIVATION_RELU.

Note

When constructing the operator description for the activation function, you must set the InputTensor and OutputTensor parameters for the activation function to NULL.

Example

DML_ACTIVATION_LEAKY_RELU_OPERATOR_DESC leakyReluDesc;
leakyReluDesc.InputTensor = nullptr;
leakyReluDesc.OutputTensor = nullptr;
leakyReluDesc.Alpha = 0.01f;

DML_OPERATOR_DESC activationDesc = { DML_OPERATOR_ACTIVATION_LEAKY_RELU, &leakyReluDesc };

DML_CONVOLUTION_OPERATOR_DESC convDesc;
// ...
convDesc.FusedActivation = &activationDesc;

For a complete example, the DirectMLSuperResolution sample utilizes fused activations to improve performance.

Operators that support fused activation

The list below is based on constants from the DML_OPERATOR_TYPE enumeration. Each constant in that topic links to the appropriate description structure to use.

  • DML_OPERATOR_BATCH_NORMALIZATION
  • DML_OPERATOR_BATCH_NORMALIZATION_TRAINING
  • DML_OPERATOR_CONVOLUTION
  • DML_OPERATOR_ELEMENT_WISE_ADD1
  • DML_OPERATOR_GEMM
  • DML_OPERATOR_MEAN_VARIANCE_NORMALIZATION
  • DML_OPERATOR_MEAN_VARIANCE_NORMALIZATION1

Activations that are supported for fusion

The list below is based on constants from the DML_OPERATOR_TYPE enumeration. Each constant in that topic links to the appropriate description structure to use.

  • DML_OPERATOR_ACTIVATION_LINEAR
  • DML_OPERATOR_ACTIVATION_SIGMOID
  • DML_OPERATOR_ACTIVATION_HARD_SIGMOID
  • DML_OPERATOR_ACTIVATION_TANH
  • DML_OPERATOR_ACTIVATION_SCALED_TANH
  • DML_OPERATOR_ACTIVATION_RELU
  • DML_OPERATOR_ACTIVATION_LEAKY_RELU
  • DML_OPERATOR_ACTIVATION_THRESHOLDED_RELU
  • DML_OPERATOR_ACTIVATION_ELU
  • DML_OPERATOR_ACTIVATION_CELU
  • DML_OPERATOR_ACTIVATION_SCALED_ELU
  • DML_OPERATOR_ACTIVATION_SOFTPLUS
  • DML_OPERATOR_ACTIVATION_PARAMETRIC_SOFTPLUS
  • DML_OPERATOR_ACTIVATION_SOFTSIGN
  • DML_OPERATOR_ACTIVATION_IDENTITY
  • DML_OPERATOR_ACTIVATION_SHRINK
  • DML_OPERATOR_ACTIVATION_GELU
  • DML_OPERATOR_ELEMENT_WISE_CLIP (For Convolution and GEMM only)

Any operators that aren't in this list aren't supported for fused activation.

See also