Κοινοποίηση μέσω


corr

Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.

Syntax

from pyspark.sql import functions as sf

sf.corr(col1, col2)

Parameters

Parameter Type Description
col1 pyspark.sql.Column or column name First column to calculate correlation.
col2 pyspark.sql.Column or column name Second column to calculate correlation.

Returns

pyspark.sql.Column: Pearson Correlation Coefficient of these two column values.

Examples

from pyspark.sql import functions as sf
a = range(20)
b = [2 * x for x in range(20)]
df = spark.createDataFrame(zip(a, b), ["a", "b"])
df.agg(sf.corr("a", df.b)).show()
+----------+
|corr(a, b)|
+----------+
|       1.0|
+----------+