freqItems (DataFrameStatFunctions)

Finds frequent items for columns, possibly with false positives. Uses the frequent element count algorithm described by Karp, Schenker, and Papadimitriou. DataFrame.freqItems and DataFrameStatFunctions.freqItems are aliases of each other.

Syntax

freqItems(cols, support=None)

Parameters

Parameter	Type	Description
`cols`	list or tuple	Names of the columns to calculate frequent items for.
`support`	float, optional	The frequency with which to consider an item frequent. Default is 1% (0.01). Must be greater than 1e-4.

Returns

DataFrame

Notes

This method is meant for exploratory data analysis. There is no guarantee of backward compatibility for the schema of the resulting DataFrame.

Examples

from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
result = df.stat.freqItems(["c1", "c2"])
result.select([sf.sort_array(c).alias(c) for c in result.columns]).show()
# +------------+------------+
# |c1_freqItems|c2_freqItems|
# +------------+------------+
# |   [1, 3, 4]| [8, 10, 11]|
# +------------+------------+

Phản hồi

Trang này có hữu ích không?

Last updated on 2026-04-17