Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Functionality for working with missing data in a DataFrame.
Supports Spark Connect
Syntax
DataFrame.na
Methods
| Method | Description |
|---|---|
drop(how, thresh, subset) |
Returns a new DataFrame omitting rows with null or NaN values. |
fill(value, subset) |
Returns a new DataFrame with null values replaced by the specified value. |
replace(to_replace, value, subset) |
Returns a new DataFrame replacing a value with another value. |
Examples
Drop rows with null values
from pyspark.sql import Row
df = spark.createDataFrame([
Row(age=10, height=80.0, name="Alice"),
Row(age=5, height=None, name="Bob"),
Row(age=None, height=None, name="Tom"),
])
df.na.drop().show()
+---+------+-----+
|age|height| name|
+---+------+-----+
| 10| 80.0|Alice|
+---+------+-----+
Fill null values
df = spark.createDataFrame([
(10, 80.5, "Alice"),
(5, None, "Bob"),
(None, None, "Tom")],
schema=["age", "height", "name"])
df.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+
|age|height| name|
+---+------+-------+
| 10| 80.5| Alice|
| 5| NULL| Bob|
| 50| NULL|unknown|
+---+------+-------+
Replace values
df = spark.createDataFrame([
(10, 80, "Alice"),
(5, None, "Bob"),
(None, 10, "Tom")],
schema=["age", "height", "name"])
df.na.replace(['Alice', 'Bob'], ['A', 'B'], 'name').show()
+----+------+----+
| age|height|name|
+----+------+----+
| 10| 80| A|
| 5| NULL| B|
|NULL| 10| Tom|
+----+------+----+