Share via


hash

Calculates the hash code of given columns, and returns the result as an int column. Supports Spark Connect.

For the corresponding Databricks SQL function, see hash function.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.hash(*cols)

Parameters

Parameter Type Description
cols pyspark.sql.Column or str One or more columns to compute on.

Returns

pyspark.sql.Column: hash value as int column.

Examples

Example 1: Computing hash of a single column

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1')).show()
+---+---+----------+
| c1| c2|  hash(c1)|
+---+---+----------+
|ABC|DEF|-757602832|
+---+---+----------+

Example 2: Computing hash of multiple columns

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1', df.c2)).show()
+---+---+------------+
| c1| c2|hash(c1, c2)|
+---+---+------------+
|ABC|DEF|   599895104|
+---+---+------------+