parquet (DataFrameReader)

加载 Parquet 文件并返回结果。DataFrame

Syntax

parquet(*paths, **options)

参数

参数 类型 说明
*paths str 从中读取 Parquet 文件的一个或多个文件路径。

退货

DataFrame

示例

将 DataFrame 写入 Parquet 文件,并将其读回。

import tempfile
df = spark.createDataFrame(
    [(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])

with tempfile.TemporaryDirectory(prefix="parquet") as d:
    df.write.mode("overwrite").format("parquet").save(d)
    spark.read.parquet(d).orderBy("name").show()
    # +---+-----+
    # |age| name|
    # +---+-----+
    # | 10|Alice|
    # | 15|  Bob|
    # | 20|  Tom|
    # +---+-----+

读取多个 Parquet 文件和合并架构。

import tempfile
df = spark.createDataFrame(
    [(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
df2 = spark.createDataFrame([(70, "Alice"), (80, "Bob")], schema=["height", "name"])

with tempfile.TemporaryDirectory(prefix="parquet1") as d1:
    with tempfile.TemporaryDirectory(prefix="parquet2") as d2:
        df.write.mode("overwrite").format("parquet").save(d1)
        df2.write.mode("overwrite").format("parquet").save(d2)

        spark.read.option(
            "mergeSchema", "true"
        ).parquet(d1, d2).select(
            "name", "age", "height"
        ).orderBy("name", "age").show()
        # +-----+----+------+
        # | name| age|height|
        # +-----+----+------+
        # |Alice|NULL|    70|
        # |Alice|  10|  NULL|
        # |  Bob|NULL|    80|
        # |  Bob|  15|  NULL|
        # |  Tom|  20|  NULL|
        # +-----+----+------+