Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Collection: Merges two given maps into a single map by applying a function to the key-value pairs. Supports Spark Connect.
For the corresponding Databricks SQL function, see map_zip_with function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.map_zip_with(col1=<col1>, col2=<col2>, f=<f>)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The name of the first column or a column expression representing the first map. |
col2 |
pyspark.sql.Column or str |
The name of the second column or a column expression representing the second map. |
f |
function |
A ternary function that defines how to merge the values from the two maps. This function should return a column that will be used as the value in the resulting map. |
Returns
pyspark.sql.Column: A new map column where each key-value pair is the result of applying the function to the corresponding key-value pairs in the input maps.
Examples
Example 1: Merging two maps with a simple function
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([
(1, {"A": 1, "B": 2}, {"A": 3, "B": 4})],
("id", "map1", "map2"))
row = df.select(
dbf.map_zip_with("map1", "map2", lambda _, v1, v2: v1 + v2).alias("updated_data")
).head()
sorted(row["updated_data"].items())
[('A', 4), ('B', 6)]
Example 2: Merging two maps with mismatched keys
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([
(1, {"A": 1, "B": 2}, {"B": 3, "C": 4})],
("id", "map1", "map2"))
row = df.select(
dbf.map_zip_with("map1", "map2",
lambda _, v1, v2: dbf.when(v2.isNull(), v1).otherwise(v1 + v2)
).alias("updated_data")
).head()
sorted(row["updated_data"].items())
[('A', 1), ('B', 5), ('C', None)]