Bilješka
Pristup ovoj stranici zahtijeva provjeru vjerodostojnosti. Možete pokušati da se prijavite ili promijenite direktorije.
Pristup ovoj stranici zahtijeva provjeru vjerodostojnosti. Možete pokušati promijeniti direktorije.
Functionality for working with missing data in a DataFrame.
Supports Spark Connect
Syntax
DataFrame.na
Methods
| Method | Description |
|---|---|
drop(how, thresh, subset) |
Returns a new DataFrame omitting rows with null or NaN values. |
fill(value, subset) |
Returns a new DataFrame with null values replaced by the specified value. |
replace(to_replace, value, subset) |
Returns a new DataFrame replacing a value with another value. |
Examples
Drop rows with null values
from pyspark.sql import Row
df = spark.createDataFrame([
Row(age=10, height=80.0, name="Alice"),
Row(age=5, height=None, name="Bob"),
Row(age=None, height=None, name="Tom"),
])
df.na.drop().show()
+---+------+-----+
|age|height| name|
+---+------+-----+
| 10| 80.0|Alice|
+---+------+-----+
Fill null values
df = spark.createDataFrame([
(10, 80.5, "Alice"),
(5, None, "Bob"),
(None, None, "Tom")],
schema=["age", "height", "name"])
df.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+
|age|height| name|
+---+------+-------+
| 10| 80.5| Alice|
| 5| NULL| Bob|
| 50| NULL|unknown|
+---+------+-------+
Replace values
df = spark.createDataFrame([
(10, 80, "Alice"),
(5, None, "Bob"),
(None, 10, "Tom")],
schema=["age", "height", "name"])
df.na.replace(['Alice', 'Bob'], ['A', 'B'], 'name').show()
+----+------+----+
| age|height|name|
+----+------+----+
| 10| 80| A|
| 5| NULL| B|
|NULL| 10| Tom|
+----+------+----+