Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Accessor for DataFrame plotting functionality in PySpark.
Syntax
# Call the accessor directly
df.plot(kind="line", ...)
# Use a dedicated method
df.plot.line(...)
Methods
| Method | Description |
|---|---|
area(x, y, **kwargs) |
Draws a stacked area plot. |
bar(x, y, **kwargs) |
Draws a vertical bar plot. |
barh(x, y, **kwargs) |
Draws a horizontal bar plot. |
box(column, **kwargs) |
Draws a box-and-whisker plot from DataFrame columns. |
hist(column, bins, **kwargs) |
Draws a histogram of the DataFrame columns. |
kde(bw_method, column, ind, **kwargs) |
Generates a Kernel Density Estimate plot using Gaussian kernels. |
line(x, y, **kwargs) |
Plots DataFrame columns as lines. |
pie(x, y, **kwargs) |
Generates a pie plot. |
scatter(x, y, **kwargs) |
Creates a scatter plot. |
Examples
Line plot
data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
columns = ["category", "int_val", "float_val"]
df = spark.createDataFrame(data, columns)
df.plot.line(x="category", y="int_val")
Bar plot
data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
columns = ["category", "int_val", "float_val"]
df = spark.createDataFrame(data, columns)
df.plot.bar(x="category", y="int_val")
Scatter plot
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.scatter(x="length", y="width")
Area plot
from datetime import datetime
data = [
(3, 5, 20, datetime(2018, 1, 31)),
(2, 5, 42, datetime(2018, 2, 28)),
(3, 6, 28, datetime(2018, 3, 31)),
(9, 12, 62, datetime(2018, 4, 30)),
]
columns = ["sales", "signups", "visits", "date"]
df = spark.createDataFrame(data, columns)
df.plot.area(x="date", y=["sales", "signups", "visits"])
Box plot
data = [
("A", 50, 55), ("B", 55, 60), ("C", 60, 65),
("D", 65, 70), ("E", 70, 75), ("F", 10, 15),
]
columns = ["student", "math_score", "english_score"]
df = spark.createDataFrame(data, columns)
df.plot.box()
KDE plot
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)
Histogram
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.hist(bins=4)