Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified.
For the corresponding Databricks SQL function, see vector_normalize function.
Syntax
from pyspark.sql import functions as dbf
dbf.vector_normalize(vector=<vector>, degree=<degree>)
Parameters
| Parameter | Type | Description |
|---|---|---|
vector |
pyspark.sql.Column or column name |
Input vector column. |
degree |
pyspark.sql.Column or column name, optional |
Norm degree (1.0 for L1, 2.0 for L2, float('inf') for infinity norm). Defaults to 2.0. |
Returns
pyspark.sql.Column: The normalized vector as an array of floats.
Examples
from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
schema = StructType([StructField('v', ArrayType(FloatType()))])
df = spark.createDataFrame([([3.0, 4.0],)], schema)
df.select(dbf.vector_normalize('v', dbf.lit(2.0).cast('float'))).first()[0]
# [0.6..., 0.8...]