Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
A column in a DataFrame.
Supports Spark Connect
Syntax
Methods
| Method | Description |
|---|---|
alias(*alias, **kwargs) |
Returns this column aliased with a new name or names (in the case of expressions that return more than one column, such as explode). |
asc() |
Returns a sort expression based on the ascending order of the column. |
asc_nulls_first() |
Returns a sort expression based on ascending order of the column, and null values return before non-null values. |
asc_nulls_last() |
Returns a sort expression based on ascending order of the column, and null values appear after non-null values. |
astype(dataType) |
Alias for cast(). |
between(lowerBound, upperBound) |
Check if the current column's values are between the specified lower and upper bounds, inclusive. |
bitwiseAND(other) |
Compute bitwise AND of this expression with another expression. |
bitwiseOR(other) |
Compute bitwise OR of this expression with another expression. |
bitwiseXOR(other) |
Compute bitwise XOR of this expression with another expression. |
cast(dataType) |
Casts the column into type dataType. |
contains(other) |
Contains the other element. |
desc() |
Returns a sort expression based on the descending order of the column. |
desc_nulls_first() |
Returns a sort expression based on the descending order of the column, and null values appear before non-null values. |
desc_nulls_last() |
Returns a sort expression based on the descending order of the column, and null values appear after non-null values. |
dropFields(*fieldNames) |
An expression that drops fields in StructType by name. |
endswith(other) |
String ends with. |
eqNullSafe(other) |
Equality test that is safe for null values. |
getField(name) |
An expression that gets a field by name in a StructType. |
getItem(key) |
An expression that gets an item at position ordinal out of a list, or gets an item by key out of a dict. |
ilike(other) |
SQL ILIKE expression (case insensitive LIKE). |
isNaN() |
True if the current expression is NaN. |
isNotNull() |
True if the current expression is NOT null. |
isNull() |
True if the current expression is null. |
isin(*cols) |
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments. |
like(other) |
SQL like expression. |
name(*alias, **kwargs) |
Alias for alias(). |
otherwise(value) |
Evaluates a list of conditions and returns one of multiple possible result expressions. |
over(window) |
Define a windowing column. |
rlike(other) |
SQL RLIKE expression (LIKE with Regex). |
startswith(other) |
String starts with. |
substr(startPos, length) |
Return a Column which is a substring of the column. |
try_cast(dataType) |
This is a special version of cast that performs the same operation, but returns a NULL value instead of raising an error if the invoke method throws exception. |
when(condition, value) |
Evaluates a list of conditions and returns one of multiple possible result expressions. |
withField(fieldName, col) |
An expression that adds/replaces a field in StructType by name. |
Operators
The Column class supports standard Python operators for arithmetic, comparison, and logical operations:
- Arithmetic:
+,-,*,/,%,** - Comparison:
==,!=,<,<=,>,>= - Logical:
&(AND),|(OR),~(NOT)
Examples
For more simple examples that demonstrate usage of columns, see Column operations.
Create Column instances
Select a column from a DataFrame:
df = spark.createDataFrame(
[(2, "Alice"), (5, "Bob")], ["age", "name"])
# Access by attribute
df.name
# Column<'name'>
# Access by bracket notation
df["name"]
# Column<'name'>
Create a column from an expression:
df.age + 1
# Column<...>
1 / df.age
# Column<...>
Basic column operations
# Arithmetic operations
df.select(df.age + 10).show()
# Comparison operations
df.filter(df.age > 3).show()
# String operations
df.filter(df.name.startswith("A")).show()
# Null checking
df.filter(df.name.isNotNull()).show()
Conditional logic
from pyspark.sql import functions as F
df.select(
F.when(df.age < 3, "child")
.when(df.age < 13, "kid")
.otherwise("adult")
.alias("age_group")
).show()
Sorting
df.orderBy(df.age.desc()).show()
df.orderBy(df.age.asc_nulls_last()).show()