Nota
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba mendaftar masuk atau menukar direktori.
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba menukar direktori.
Loads Parquet files and returns the result as a DataFrame.
Syntax
parquet(*paths, **options)
Parameters
| Parameter | Type | Description |
|---|---|---|
*paths |
str | One or more file paths to read the Parquet files from. |
Returns
DataFrame
Examples
Write a DataFrame into a Parquet file and read it back.
import tempfile
df = spark.createDataFrame(
[(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
with tempfile.TemporaryDirectory(prefix="parquet") as d:
df.write.mode("overwrite").format("parquet").save(d)
spark.read.parquet(d).orderBy("name").show()
# +---+-----+
# |age| name|
# +---+-----+
# | 10|Alice|
# | 15| Bob|
# | 20| Tom|
# +---+-----+
Read multiple Parquet files and merge schemas.
import tempfile
df = spark.createDataFrame(
[(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
df2 = spark.createDataFrame([(70, "Alice"), (80, "Bob")], schema=["height", "name"])
with tempfile.TemporaryDirectory(prefix="parquet1") as d1:
with tempfile.TemporaryDirectory(prefix="parquet2") as d2:
df.write.mode("overwrite").format("parquet").save(d1)
df2.write.mode("overwrite").format("parquet").save(d2)
spark.read.option(
"mergeSchema", "true"
).parquet(d1, d2).select(
"name", "age", "height"
).orderBy("name", "age").show()
# +-----+----+------+
# | name| age|height|
# +-----+----+------+
# |Alice|NULL| 70|
# |Alice| 10| NULL|
# | Bob|NULL| 80|
# | Bob| 15| NULL|
# | Tom| 20| NULL|
# +-----+----+------+