`vector_avg` aggregate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Computes the element-wise average of vectors in an aggregate. Returns a vector where each element is the arithmetic mean of the corresponding elements across all input vectors.

Syntax

vector_avg(vectors) [FILTER ( WHERE cond ) ]

Arguments

vectors: A column of ARRAY<FLOAT> expressions representing vectors. All vectors must have the same dimension.
cond: An optional boolean expression filtering the rows used for aggregation.

Returns

An ARRAY<FLOAT> value with the same dimension as the input vectors. Each element in the result is the average of the corresponding elements across all input vectors.

NULL values and non-NULL vectors containing a NULL element are ignored in the aggregation. Returns NULL if all values in the group are invalid. Returns an empty array [] if all input vectors are empty.

Notes

Only ARRAY<FLOAT> is supported; other types such as ARRAY<DOUBLE> or ARRAY<DECIMAL> raise an error.
All input vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
A non-NULL vector that contains a NULL element is treated as NULL.

Common error conditions

VECTOR_DIMENSION_MISMATCH

Examples

-- Element-wise average per category (with GROUP BY)
> SELECT category, vector_avg(embedding) AS centroid
    FROM vector_data
    GROUP BY category
    ORDER BY category;
  category: A, centroid: [3.0, 6.0, 9.0]
  category: B, centroid: [2.0, 4.0, 6.0]

-- Scalar aggregation (no GROUP BY)
> SELECT vector_avg(embedding) AS overall_centroid FROM vector_data;
  overall_centroid: [2.5, 5.0, 7.5]

Feedback

Was this page helpful?

Last updated on 2026-04-13