DataFrameReader 類別

用於從外部儲存系統（例如檔案系統、鍵值儲存等）載入資料框架的介面。

支援 Spark Connect

語法

使用 SparkSession.read 此介面。

方法

方法	說明
`format(source)`	指定輸入資料來源格式。
`schema(schema)`	指定輸入結構。
`option(key, value)`	新增底層資料來源的輸入選項。
`options(**options)`	新增底層資料來源的輸入選項。
`load(path, format, schema, **options)`	從資料來源載入資料並以 DataFrame 回傳。
`json(path, schema, ...)`	載入 JSON 檔案並以 DataFrame 回傳結果。
`table(tableName)`	回傳指定的資料表為 DataFrame。
`parquet(paths, *options)`	載入 Parquet 檔案，並以資料框架回傳結果。
`text(paths, wholetext, lineSep, ...)`	載入文字檔並回傳一個資料框架，其結構以名為「value」的字串欄位開頭。
`csv(path, schema, sep, encoding, ...)`	載入 CSV 檔案，並以資料幀形式回傳結果。
`xml(path, rowTag, schema, ...)`	載入 XML 檔案並以資料框架回傳結果。
`excel(path, dataAddress, headerRows, ...)`	載入 Excel 檔案，並將結果以 DataFrame 形式回傳。
`orc(path, mergeSchema, pathGlobFilter, ...)`	載入 ORC 檔案，並以資料幀形式回傳結果。
`jdbc(url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)`	建立一個資料框架，代表名為 table 的資料庫資料表，並可透過 JDBC URL 及連線屬性存取。

Examples

從不同資料來源讀取

# Access DataFrameReader through SparkSession
spark.read

# Read JSON file
df = spark.read.json("path/to/file.json")

# Read CSV file with options
df = spark.read.option("header", "true").csv("path/to/file.csv")

# Read Parquet file
df = spark.read.parquet("path/to/file.parquet")

# Read from a table
df = spark.read.table("table_name")

使用格式與載入

# Specify format explicitly
df = spark.read.format("json").load("path/to/file.json")

# With options
df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("path/to/file.csv")

規範結構

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Read CSV with schema
df = spark.read.schema(schema).csv("path/to/file.csv")

# Read CSV with DDL-formatted string schema
df = spark.read.schema("name STRING, age INT").csv("path/to/file.csv")

來自JDBC的朗讀

# Read from database table
df = spark.read.jdbc(
    url="jdbc:postgresql://localhost:5432/mydb",
    table="users",
    properties={"user": "myuser", "password": "mypassword"}
)

# Read with partitioning for parallel loading
df = spark.read.jdbc(
    url="jdbc:postgresql://localhost:5432/mydb",
    table="users",
    column="id",
    lowerBound=1,
    upperBound=1000,
    numPartitions=10,
    properties={"user": "myuser", "password": "mypassword"}
)

方法鏈

# Chain multiple configuration methods
df = spark.read \
    .format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .option("delimiter", ",") \
    .schema("name STRING, age INT") \
    .load("path/to/file.csv")

意見反應

此頁面對您有幫助嗎？

Last updated on 2026-03-15

共用方式為

DataFrameReader 類別

語法

方法

Examples

從不同資料來源讀取

使用格式與載入

規範結構

來自JDBC的朗讀

方法鏈

意見反應

其他資源