Note
Ang pag-access sa pahinang ito ay nangangailangan ng pahintulot. Maaari mong subukang mag-sign in o magpalit ng mga direktoryo.
Ang pag-access sa pahinang ito ay nangangailangan ng pahintulot. Maaari mong subukang baguhin ang mga direktoryo.
This page provides an overview of reference available for PySpark, a Python API for Spark. For more information about PySpark, see PySpark on Azure Databricks.
Data types
For a complete list of PySpark data types, see PySpark data types.
Classes
| Reference | Description |
|---|---|
| Avro | Support for reading and writing data in Apache Avro format. |
| Catalog | Interface for managing databases, tables, functions, and other catalog metadata. |
| Column | Operations for working with DataFrame columns, including transformations and expressions. |
| Data Types | Available data types in PySpark SQL, including primitive types, complex types, and user-defined types. |
| DataFrame | Distributed collection of data organized into named columns, similar to a table in a relational database. |
| DataFrameNaFunctions | Functionality for working with missing data in a DataFrame. |
| DataFrameReader | Interface used to load a DataFrame from external storage systems. |
| DataFrameStatFunctions | Functionality for statistical functions with a DataFrame. |
| DataFrameWriter | Interface used to write a DataFrame to external storage systems. |
| DataFrameWriterV2 | Interface used to write a DataFrame to external storage (version 2). |
| DataSource | APIs for implementing custom data sources to read from external systems. For information about custom data sources, see PySpark custom data sources. |
| DataSourceArrowWriter | A base class for data source writers that process data using PyArrow's RecordBatch. |
| DataSourceRegistration | A wrapper for data source registration. |
| DataSourceReader | A base class for data source readers. |
| DataSourceStreamArrowWriter | A base class for data stream writers that process data using PyArrow's RecordBatch. |
| DataSourceStreamReader | A base class for streaming data source readers. |
| DataSourceStreamWriter | A base class for data stream writers. |
| GroupedData | Methods for grouping data and performing aggregation operations on grouped DataFrames. |
| Observation | Collects metrics and observes DataFrames during query execution for monitoring and debugging. |
| PlotAccessor | Accessor for DataFrame plotting functionality in PySpark. |
| ProtoBuf | Support for serializing and deserializing data using Protocol Buffers format. |
| Row | Represents a row of data in a DataFrame, providing access to individual field values. |
| RuntimeConfig | Runtime configuration options for Spark SQL, including execution and optimizer settings. For information on configuration that is only available on Databricks, see Set Spark configuration properties on Azure Databricks. |
| SparkSession | The entry point for reading data and executing SQL queries in PySpark applications. |
| Stateful Processor | Manages state across streaming batches for complex stateful operations in structured streaming. |
| UserDefinedFunction (UDF) | User-defined functions for applying custom Python logic to DataFrame columns. |
| UDFRegistration | Wrapper for user-defined function registration. This instance can be accessed by spark.udf. |
| UserDefinedTableFunction (UDTF) | User-defined table functions that return multiple rows for each input row. |
| UDTFRegistration | Wrapper for user-defined table function registration. This instance can be accessed by spark.udtf. |
| VariantVal | Represents semi-structured data with flexible schema, which supports dynamic types and nested structures. |
| Window | Window functions for performing calculations across a set of table rows related to the current row. |
| WindowSpec | Window functions for performing calculations across a set of table rows related to the current row. |
Functions
For a complete list of available built-in functions, see PySpark functions.