Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the cosine similarity between two float vectors. The vectors must have the same dimension.
For the corresponding Databricks SQL function, see vector_cosine_similarity function.
Syntax
from pyspark.sql import functions as dbf
dbf.vector_cosine_similarity(left=<left>, right=<right>)
Parameters
| Parameter | Type | Description |
|---|---|---|
left |
pyspark.sql.Column or column name |
First vector column. |
right |
pyspark.sql.Column or column name |
Second vector column. |
Returns
pyspark.sql.Column: Cosine similarity as a float value.
Examples
from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
schema = StructType([StructField('a', ArrayType(FloatType())), StructField('b', ArrayType(FloatType()))])
df = spark.createDataFrame([([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])], schema)
df.select(dbf.vector_cosine_similarity('a', 'b')).first()[0]
# 0.974631...