DML_GATHER_ND1_OPERATOR_DESC structure (directml.h)

Gathers elements from the input tensor, using the indices tensor to remap indices to entire subblocks of the input. This operator performs the following pseudocode, where "..." represents a series of coordinates, with the exact behavior dependent on the batch, input, and indices dimension count.

output[batch, ...] = input[batch, indices[batch, ...], ...]

Syntax

struct DML_GATHER_ND1_OPERATOR_DESC {
  const DML_TENSOR_DESC *InputTensor;
  const DML_TENSOR_DESC *IndicesTensor;
  const DML_TENSOR_DESC *OutputTensor;
  UINT                  InputDimensionCount;
  UINT                  IndicesDimensionCount;
  UINT                  BatchDimensionCount;
};

Members

InputTensor

Type: const DML_TENSOR_DESC*

The tensor to read from.

IndicesTensor

Type: const DML_TENSOR_DESC*

The tensor containing the indices. The DimensionCount of this tensor must match InputTensor.DimensionCount. The last dimension of the IndicesTensor is actually the number of coordinates per index tuple, and it cannot exceed InputTensor.DimensionCount. For example, an indices tensor of Sizes {1,4,5,2} with IndicesDimensionCount = 3 means a 4x5 array of 2-coordinate tuples that index into InputTensor.

Starting with DML_FEATURE_LEVEL_3_0, this operator supports negative index values when using a signed integral type with this tensor. Negative indices are interpreted as being relative to the end of the respective dimension. For example, an index of -1 refers to the last element along that dimension.

OutputTensor

Type: const DML_TENSOR_DESC*

The tensor to write the results to. The DimensionCount and DataType of this tensor must match InputTensor.DimensionCount. The expected OutputTensor.Sizes are the concatenation of the IndicesTensor.Sizes leading segments and InputTensor.Sizes trailing segment, which yields the following.

indexTupleSize = IndicesTensor.Sizes[IndicesTensor.DimensionCount - 1]
OutputTensor.Sizes = {
    1...,
    IndicesTensor.Sizes[(IndicesTensor.DimensionCount - IndicesDimensionCount) .. (IndicesTensor.DimensionCount - 1)],
    InputTensor.Sizes[(InputTensor.DimensionCount - indexTupleSize) .. InputTensor.DimensionCount]
}

The dimensions are right-aligned, with leading 1 values prepended if needed to satisfy OutputTensor.DimensionCount.

Here's an example.

InputTensor.Sizes = {3,4,5,6,7}
InputDimensionCount = 5
IndicesTensor.Sizes = {1,1, 1,2,3}
IndicesDimensionCount = 3 // can be thought of as a {1,2} array of 3-coordinate tuples

// The {1,2} comes from the indices tensor (ignoring last dimension which is the tuple size),
// and the {6,7} comes from input tensor, ignoring the first 3 dimensions
// since the index tuples are 3 elements (from the indices tensor last dimension).
OutputTensor.Sizes = {1, 1,2,6,7}

InputDimensionCount

Type: UINT

The number of actual input dimensions within the InputTensor after ignoring any irrelevant leading ones, ranging [1, *InputTensor.DimensionCount*]. For example, given InputTensor.Sizes = {1,1,4,6} and InputDimensionCount = 3, the actual meaningful indices are {1,4,6}.

IndicesDimensionCount

Type: UINT

The number of actual index dimensions within the IndicesTensor after ignoring any irrelevant leading ones, ranging [1, IndicesTensor.DimensionCount]. For example, given IndicesTensor.Sizes = {1,1,4,6}, and IndicesDimensionCount = 3, the actual meaningful indices are {1,4,6}.

BatchDimensionCount

Type: UINT

The number of dimensions within each tensor (InputTensor, IndicesTensor, OutputTensor) that are considered independent batches, ranging within both [0, InputTensor.DimensionCount) and [0, IndicesTensor.DimensionCount). The batch count can be 0, implying a single batch. For example, given IndicesTensor.Sizes = {1,3,4,5,6,7}, and IndicesDimensionCount = 5 and BatchDimensionCount = 2, there are batches {3,4} and meaningful indices {5,6,7}.

Remarks

DML_GATHER_ND1_OPERATOR_DESC adds BatchDimensionCount, and is equivalent to DML_GATHER_ND_OPERATOR_DESC when BatchDimensionCount = 0.

Examples

Example 1. 1D remapping

InputDimensionCount: 2
IndicesDimensionCount: 2
BatchDimensionCount: 0

InputTensor: (Sizes:{2,2}, DataType:FLOAT32)
    [[0,1],[2,3]]

IndicesTensor: (Sizes:{2,1}, DataType:UINT32)
    [[1],[0]]

// output[y, x] = input[indices[y], x]
OutputTensor: (Sizes:{2,2}, DataType:FLOAT32)
    [[2,3],[0,1]]

Example 2. 2D remapping with batch count

InputDimensionCount: 3
IndicesDimensionCount: 3
BatchDimensionCount: 1

// 3 batches.
InputTensor: (Sizes:{1, 3,2,2}, DataType:FLOAT32)
    [
        [[[0,1],[2,3]],   // batch 0
         [[4,5],[6,7]],   // batch 1
         [[8,9],[10,11]]] // batch 2
    ]

// A 3x2 array of 2D tuples indexing into InputTensor.
// e.g. a tuple of <1,0> in batch 1 corresponds to input value 6.
IndicesTensor: (Sizes:{1, 3,2,2}, DataType:UINT32)
    [
        [[[0,0],[1,1]],
         [[1,1],[0,0]],
         [[0,1],[1,0]]]
    ]

// output[batch, x] = input[batch, indices[batch, x, 0], indices[batch, x, 1]]
OutputTensor: (Sizes:{1,1, 3,2}, DataType:FLOAT32)
    [[
        [[0,3],
         [7,4],
         [9,10]]
    ]]

Availability

This operator was introduced in DML_FEATURE_LEVEL_3_0.

Tensor constraints

  • IndicesTensor, InputTensor, and OutputTensor must have the same DimensionCount.
  • InputTensor and OutputTensor must have the same DataType.

Tensor support

DML_FEATURE_LEVEL_4_1 and above

Tensor Kind Supported dimension counts Supported data types
InputTensor Input 1 to 8 FLOAT64, FLOAT32, FLOAT16, INT64, INT32, INT16, INT8, UINT64, UINT32, UINT16, UINT8
IndicesTensor Input 1 to 8 INT64, INT32, UINT64, UINT32
OutputTensor Output 1 to 8 FLOAT64, FLOAT32, FLOAT16, INT64, INT32, INT16, INT8, UINT64, UINT32, UINT16, UINT8

DML_FEATURE_LEVEL_3_0 and above

Tensor Kind Supported dimension counts Supported data types
InputTensor Input 1 to 8 FLOAT32, FLOAT16, INT32, INT16, INT8, UINT32, UINT16, UINT8
IndicesTensor Input 1 to 8 INT64, INT32, UINT64, UINT32
OutputTensor Output 1 to 8 FLOAT32, FLOAT16, INT32, INT16, INT8, UINT32, UINT16, UINT8

Requirements

Requirement Value
Minimum supported client Windows 10 Build 20348
Minimum supported server Windows 10 Build 20348
Header directml.h