หมายเหตุ
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลอง ลงชื่อเข้าใช้หรือเปลี่ยนไดเรกทอรีได้
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลองเปลี่ยนไดเรกทอรีได้
The text format reads each line of a text file as a row in a DataFrame with a single value column of type StringType. Azure Databricks users commonly use it for log parsing, ingesting raw data before further processing, or any workflow that requires line-by-line access to file content. Azure Databricks supports reading and writing text files with Apache Spark, including configurable compression for writes.
Prerequisites
Azure Databricks does not require additional configuration to use text files. However, to stream text files, you need Auto Loader.
Options
Use the .option() and .options() methods of DataFrameReader and DataFrameWriter to configure text data sources. For a complete list of supported options, see DataFrameReader text options and DataFrameWriter text options.
Set text compression
When writing text files to cloud storage, compression reduces storage costs. Text-based formats compress effectively because they contain repetitive structure and character sequences.
Configure compression using the compression write option. The default is none.
| Codec | Description |
|---|---|
none |
No compression. Default. |
bzip2 |
High compression ratio, slowest option. Best for archival use cases where storage cost is the priority. |
deflate |
Higher compression ratio than snappy at the cost of additional CPU time. |
gzip |
Good compression ratio with wider tooling support than snappy. |
lz4 |
Optimized for speed with lower compression ratio. |
snappy |
Optimized for speed with moderate compression. Good for interactive workloads. |
zstd |
Good balance of speed and compression ratio; faster than deflate at similar or better ratios. |
For example, write Wanderbricks review comments to review_comments_compressed using gzip compression.
Python
from pyspark.sql.functions import col
df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").option("compression", "gzip").save("/Volumes/<catalog>/<schema>/<volume>/review_comments_compressed")
Scala
import org.apache.spark.sql.functions.col
val df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").option("compression", "gzip").save("/Volumes/<catalog>/<schema>/<volume>/review_comments_compressed")
SQL
CREATE TABLE review_comments_compressed (value STRING)
USING TEXT
OPTIONS (compression 'gzip');
Usage
The following examples use the Wanderbricks dataset to demonstrate reading and writing text files using the Spark DataFrame API and SQL.
Read text files using SQL
To query text files without registering a table, use read_files. Unity Catalog permissions on the external location apply automatically.
SELECT * FROM read_files(
'/Volumes/<catalog>/<schema>/<volume>/review_comments',
format => 'text'
)
Read and write text files
The text format requires a DataFrame with a single StringType column. The following examples write Wanderbricks review comments as a text file, then read them back.
Python
from pyspark.sql.functions import col
# Write wanderbricks review comments as a text file
df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").save("/Volumes/<catalog>/<schema>/<volume>/review_comments")
# Read a text file — each line becomes a row in the "value" column
df = spark.read.format("text").load("/Volumes/<catalog>/<schema>/<volume>/review_comments")
display(df)
Scala
import org.apache.spark.sql.functions.col
// Write wanderbricks review comments as a text file
val df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").save("/Volumes/<catalog>/<schema>/<volume>/review_comments")
// Read a text file — each line becomes a row in the "value" column
val text = spark.read.format("text").load("/Volumes/<catalog>/<schema>/<volume>/review_comments")
text.show()
Additional resources
- Read and write CSV files: If your text data is delimited or tabular, CSV provides structured parsing with schema inference, header support, and configurable delimiters.