Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.
Syntax
from pyspark.sql import functions as sf
sf.corr(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or column name |
First column to calculate correlation. |
col2 |
pyspark.sql.Column or column name |
Second column to calculate correlation. |
Returns
pyspark.sql.Column: Pearson Correlation Coefficient of these two column values.
Examples
from pyspark.sql import functions as sf
a = range(20)
b = [2 * x for x in range(20)]
df = spark.createDataFrame(zip(a, b), ["a", "b"])
df.agg(sf.corr("a", df.b)).show()
+----------+
|corr(a, b)|
+----------+
| 1.0|
+----------+