Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function. Supports Spark Connect.
For the corresponding Databricks SQL function, see zip_with function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.zip_with(left=<left>, right=<right>, f=<f>)
Parameters
| Parameter | Type | Description |
|---|---|---|
left |
pyspark.sql.Column or str |
Name of the first column or expression. |
right |
pyspark.sql.Column or str |
Name of the second column or expression. |
f |
function |
A binary function. |
Returns
pyspark.sql.Column: array of calculated values derived by applying given function to each pair of arguments.
Examples
Example 1: Merging two arrays with a simple function
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, [1, 3, 5, 8], [0, 2, 4, 6])], ("id", "xs", "ys"))
df.select(dbf.zip_with("xs", "ys", lambda x, y: x ** y).alias("powers")).show(truncate=False)
+---------------------------+
|powers |
+---------------------------+
|[1.0, 9.0, 625.0, 262144.0]|
+---------------------------+
Example 2: Merging arrays of different lengths
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, ["foo", "bar"], [1, 2, 3])], ("id", "xs", "ys"))
df.select(dbf.zip_with("xs", "ys", lambda x, y: dbf.concat_ws("_", x, y)).alias("xs_ys")).show()
+-----------------+
| xs_ys|
+-----------------+
|[foo_1, bar_2, 3]|
+-----------------+