หมายเหตุ
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลอง ลงชื่อเข้าใช้หรือเปลี่ยนไดเรกทอรีได้
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลองเปลี่ยนไดเรกทอรีได้
Azure Databricks has built-in keyword bindings for all of the data formats natively supported by Apache Spark. Azure Databricks uses Delta Lake as the default protocol for reading and writing data and tables, whereas Apache Spark uses Parquet.
These articles provide an overview of many of the options and configurations available when you query data on Azure Databricks.
The following data formats have built-in keyword configurations in Apache Spark DataFrames and SQL:
Azure Databricks also provides a custom keyword for loading MLflow experiments.
Data formats with special considerations
Some data formats require additional configuration or special considerations for use:
- Databricks recommends loading images as
binarydata. - Most formats support write compression via the
compressionoption. See the compression section in each format's documentation for configuration details. Azure Databricks can also directly read pre-compressed files in many formats, and you can unzip compressed files on Azure Databricks if necessary.- Text-based (CSV, JSON, XML, text):
none(default),bzip2,gzip,lz4,snappy,deflate, andzstd - Parquet:
snappy(default),gzip,lzo,brotli,lz4, andzstd - ORC:
snappy,zlib, andlzo - Avro:
snappy(default),deflate,bzip2,xz, andzstandard
- Text-based (CSV, JSON, XML, text):
For more information about Apache Spark data sources, see Generic Load/Save Functions and Generic File Source Options.