crosstab (DataFrameStatFunctions)

Computes a pair-wise frequency table of the given columns, also known as a contingency table. The first column of each row contains the distinct values of col1, and the column names are the distinct values of col2. The name of the first column is $col1_$col2. Pairs with no occurrences have a count of zero. DataFrame.crosstab and DataFrameStatFunctions.crosstab are aliases of each other.

Syntax

crosstab(col1, col2)

Parameters

Parameter	Type	Description
`col1`	str	The name of the first column. Distinct items make up the first column of each row.
`col2`	str	The name of the second column. Distinct items make up the column names of the resulting `DataFrame`.

Returns

DataFrame

Examples

df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
df.stat.crosstab("c1", "c2").sort("c1_c2").show()
# +-----+---+---+---+
# |c1_c2| 10| 11|  8|
# +-----+---+---+---+
# |    1|  0|  2|  0|
# |    3|  1|  0|  0|
# |    4|  0|  0|  2|
# +-----+---+---+---+

Feedback

Var denne side nyttig?

Last updated on 2026-04-17