vector_normalize

Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified.

For the corresponding Databricks SQL function, see vector_normalize function.

Syntax

from pyspark.sql import functions as dbf

dbf.vector_normalize(vector=<vector>, degree=<degree>)

Parameters

Parameter Type Description
vector pyspark.sql.Column or column name Input vector column.
degree pyspark.sql.Column or column name, optional Norm degree (1.0 for L1, 2.0 for L2, float('inf') for infinity norm). Defaults to 2.0.

Returns

pyspark.sql.Column: The normalized vector as an array of floats.

Examples

from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField

schema = StructType([StructField('v', ArrayType(FloatType()))])
df = spark.createDataFrame([([3.0, 4.0],)], schema)
df.select(dbf.vector_normalize('v', dbf.lit(2.0).cast('float'))).first()[0]
# [0.6..., 0.8...]