DataFrameReader 클래스

외부 스토리지 시스템(예: 파일 시스템, 키-값 저장소 등)에서 DataFrame을 로드하는 데 사용되는 인터페이스입니다.

Spark Connect 지원

문법

이 인터페이스에 액세스하는 데 사용합니다 SparkSession.read .

메서드

메서드	설명
`format(source)`	입력 데이터 원본 형식을 지정합니다.
`schema(schema)`	입력 스키마를 지정합니다.
`option(key, value)`	기본 데이터 원본에 대한 입력 옵션을 추가합니다.
`options(**options)`	기본 데이터 원본에 대한 입력 옵션을 추가합니다.
`load(path, format, schema, **options)`	데이터 원본에서 데이터를 로드하고 데이터 프레임으로 반환합니다.
`json(path, schema, ...)`	JSON 파일을 로드하고 결과를 DataFrame으로 반환합니다.
`table(tableName)`	지정된 테이블을 DataFrame으로 반환합니다.
`parquet(paths, *options)`	Parquet 파일을 로드하고 결과를 DataFrame으로 반환합니다.
`text(paths, wholetext, lineSep, ...)`	텍스트 파일을 로드하고 스키마가 "value"라는 문자열 열로 시작하는 DataFrame을 반환합니다.
`csv(path, schema, sep, encoding, ...)`	CSV 파일을 로드하고 결과를 DataFrame으로 반환합니다.
`xml(path, rowTag, schema, ...)`	XML 파일을 로드하고 결과를 DataFrame으로 반환합니다.
`excel(path, dataAddress, headerRows, ...)`	결과를 DataFrame으로 반환하여 Excel 파일을 로드합니다.
`orc(path, mergeSchema, pathGlobFilter, ...)`	ORC 파일을 로드하여 결과를 DataFrame으로 반환합니다.
`jdbc(url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)`	JDBC URL URL 및 연결 속성을 통해 액세스할 수 있는 테이블이라는 데이터베이스 테이블을 나타내는 DataFrame을 생성합니다.

예제

다른 데이터 원본에서 읽기

# Access DataFrameReader through SparkSession
spark.read

# Read JSON file
df = spark.read.json("path/to/file.json")

# Read CSV file with options
df = spark.read.option("header", "true").csv("path/to/file.csv")

# Read Parquet file
df = spark.read.parquet("path/to/file.parquet")

# Read from a table
df = spark.read.table("table_name")

형식 및 로드 사용

# Specify format explicitly
df = spark.read.format("json").load("path/to/file.json")

# With options
df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("path/to/file.csv")

스키마 지정

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Read CSV with schema
df = spark.read.schema(schema).csv("path/to/file.csv")

# Read CSV with DDL-formatted string schema
df = spark.read.schema("name STRING, age INT").csv("path/to/file.csv")

JDBC에서 읽기

# Read from database table
df = spark.read.jdbc(
    url="jdbc:postgresql://localhost:5432/mydb",
    table="users",
    properties={"user": "myuser", "password": "mypassword"}
)

# Read with partitioning for parallel loading
df = spark.read.jdbc(
    url="jdbc:postgresql://localhost:5432/mydb",
    table="users",
    column="id",
    lowerBound=1,
    upperBound=1000,
    numPartitions=10,
    properties={"user": "myuser", "password": "mypassword"}
)

메서드 연결

# Chain multiple configuration methods
df = spark.read \
    .format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .option("delimiter", ",") \
    .schema("name STRING, age INT") \
    .load("path/to/file.csv")

피드백

이 페이지가 도움이 되었나요?

Last updated on 2026-04-17

DataFrameReader 클래스

문법

메서드

예제

다른 데이터 원본에서 읽기

형식 및 로드 사용

스키마 지정

JDBC에서 읽기

메서드 연결

피드백

추가 리소스