vector_cosine_similarity

Returns the cosine similarity between two float vectors. The vectors must have the same dimension.

For the corresponding Databricks SQL function, see vector_cosine_similarity function.

Syntax

from pyspark.sql import functions as dbf

dbf.vector_cosine_similarity(left=<left>, right=<right>)

Parameters

Parameter Type Description
left pyspark.sql.Column or column name First vector column.
right pyspark.sql.Column or column name Second vector column.

Returns

pyspark.sql.Column: Cosine similarity as a float value.

Examples

from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField

schema = StructType([StructField('a', ArrayType(FloatType())), StructField('b', ArrayType(FloatType()))])
df = spark.createDataFrame([([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])], schema)
df.select(dbf.vector_cosine_similarity('a', 'b')).first()[0]
# 0.974631...