Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Computes a pair-wise frequency table of the given columns, also known as a contingency table. The first column of each row contains the distinct values of col1, and the column names are the distinct values of col2. The name of the first column is $col1_$col2. Pairs with no occurrences have a count of zero. DataFrame.crosstab and DataFrameStatFunctions.crosstab are aliases of each other.
Syntax
crosstab(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
str | The name of the first column. Distinct items make up the first column of each row. |
col2 |
str | The name of the second column. Distinct items make up the column names of the resulting DataFrame. |
Returns
DataFrame
Examples
df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
df.stat.crosstab("c1", "c2").sort("c1_c2").show()
# +-----+---+---+---+
# |c1_c2| 10| 11| 8|
# +-----+---+---+---+
# | 1| 0| 2| 0|
# | 3| 1| 0| 0|
# | 4| 0| 0| 2|
# +-----+---+---+---+