Note
Kailangan ng pahintulot para ma-access ang page na ito. Maaari mong subukang mag-sign in o magpalit ng mga direktoryo.
Ang pag-access sa pahinang ito ay nangangailangan ng pahintulot. Maaari mong subukang baguhin ang mga direktoryo.
Return a Column object for a SCALAR Subquery containing exactly one row and one column.
Syntax
scalar()
Returns
Column: A Column object representing a SCALAR subquery.
Notes
The scalar() method is useful for extracting a Column object that represents a scalar value from a DataFrame, especially when the DataFrame results from an aggregation or single-value computation. This returned Column can then be used directly in select clauses or as predicates in filters on the outer DataFrame, enabling dynamic data filtering and calculations based on scalar values.
Examples
data = [
(1, "Alice", 45000, 101), (2, "Bob", 54000, 101), (3, "Charlie", 29000, 102),
(4, "David", 61000, 102), (5, "Eve", 48000, 101),
]
employees = spark.createDataFrame(data, ["id", "name", "salary", "department_id"])
from pyspark.sql import functions as sf
employees.where(
sf.col("salary") > employees.select(sf.avg("salary")).scalar()
).select("name", "salary", "department_id").orderBy("name").show()
# +-----+------+-------------+
# | name|salary|department_id|
# +-----+------+-------------+
# | Bob| 54000| 101|
# |David| 61000| 102|
# | Eve| 48000| 101|
# +-----+------+-------------+
employees.alias("e1").where(
sf.col("salary")
> employees.alias("e2").where(
sf.col("e2.department_id") == sf.col("e1.department_id").outer()
).select(sf.avg("salary")).scalar()
).select("name", "salary", "department_id").orderBy("name").show()
# +-----+------+-------------+
# | name|salary|department_id|
# +-----+------+-------------+
# | Bob| 54000| 101|
# |David| 61000| 102|
# +-----+------+-------------+