Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks Runtime 18.1 and above
Computes the element-wise average of vectors in an aggregate. Returns a vector where each element is the arithmetic mean of the corresponding elements across all input vectors.
Syntax
vector_avg(vectors) [FILTER ( WHERE cond ) ]
Arguments
- vectors: A column of
ARRAY<FLOAT>expressions representing vectors. All vectors must have the same dimension. - cond: An optional boolean expression filtering the rows used for aggregation.
Returns
An ARRAY<FLOAT> value with the same dimension as the input vectors. Each element in the result is the average of the corresponding elements across all input vectors.
NULL values and non-NULL vectors containing a NULL element are ignored in the aggregation. Returns NULL if all values in the group are invalid. Returns an empty array [] if all input vectors are empty.
Notes
- Only
ARRAY<FLOAT>is supported; other types such asARRAY<DOUBLE>orARRAY<DECIMAL>raise an error. - All input vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
- A non-
NULLvector that contains aNULLelement is treated asNULL.
Error conditions
Examples
-- Element-wise average per category (with GROUP BY)
> SELECT category, vector_avg(embedding) AS centroid
FROM vector_data
GROUP BY category
ORDER BY category;
category: A, centroid: [3.0, 6.0, 9.0]
category: B, centroid: [2.0, 4.0, 6.0]
-- Scalar aggregation (no GROUP BY)
> SELECT vector_avg(embedding) AS overall_centroid FROM vector_data;
overall_centroid: [2.5, 5.0, 7.5]