Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Collection function: Returns a new map column whose key-value pairs satisfy a given predicate function. Supports Spark Connect.
For the corresponding Databricks SQL function, see map_filter function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.map_filter(col=<col>, f=<f>)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The name of the column or a column expression representing the map to be filtered. |
f |
function |
A binary function that defines the predicate. This function should return a boolean column that will be used to filter the input map. |
Returns
pyspark.sql.Column: A new map column containing only the key-value pairs that satisfy the predicate.
Examples
Example 1: Filtering a map with a simple condition
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data"))
row = df.select(
dbf.map_filter("data", lambda _, v: v > 30.0).alias("data_filtered")
).head()
sorted(row["data_filtered"].items())
[('baz', 32.0), ('foo', 42.0)]
Example 2: Filtering a map with a condition on keys
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data"))
row = df.select(
dbf.map_filter("data", lambda k, _: k.startswith("b")).alias("data_filtered")
).head()
sorted(row["data_filtered"].items())
[('bar', 1.0), ('baz', 32.0)]