หมายเหตุ
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลอง ลงชื่อเข้าใช้หรือเปลี่ยนไดเรกทอรีได้
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลองเปลี่ยนไดเรกทอรีได้
This page provides an overview of reference available for PySpark, a Python API for Spark. For more information about PySpark, see PySpark on Azure Databricks.
| Reference | Description |
|---|---|
| Core Classes | Main classes for working with PySpark SQL, including SparkSession and DataFrame fundamentals. |
| Spark Session | The entry point for reading data and executing SQL queries in PySpark applications. |
| Configuration | Runtime configuration options for Spark SQL, including execution and optimizer settings. For information on configuration that is only available on Databricks, see Set Spark configuration properties on Azure Databricks. |
| DataFrame | Distributed collection of data organized into named columns, similar to a table in a relational database. |
| Input/Output | Methods for reading data from and writing data to various file formats and data sources. |
| Column | Operations for working with DataFrame columns, including transformations and expressions. |
| Data Types | Available data types in PySpark SQL, including primitive types, complex types, and user-defined types. |
| Row | Represents a row of data in a DataFrame, providing access to individual field values. |
| Functions | Built-in functions for data manipulation, transformation, and aggregation operations. |
| Window | Window functions for performing calculations across a set of table rows related to the current row. |
| Grouping | Methods for grouping data and performing aggregation operations on grouped DataFrames. |
| Catalog | Interface for managing databases, tables, functions, and other catalog metadata. |
| Avro | Support for reading and writing data in Apache Avro format. |
| Observation | Collects metrics and observes DataFrames during query execution for monitoring and debugging. |
| UDF | User-defined functions for applying custom Python logic to DataFrame columns. |
| UDTF | User-defined table functions that return multiple rows for each input row. |
| VariantVal | Handles semi-structured data with flexible schema, supporting dynamic types and nested structures. |
| ProtoBuf | Support for serializing and deserializing data using Protocol Buffers format. |
| Python DataSource | APIs for implementing custom data sources to read from external systems. For information about custom data sources, see PySpark custom data sources. |
| Stateful Processor | Manages state across streaming batches for complex stateful operations in structured streaming. |