This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
A data team wants to run Spark SQL queries in a PySpark notebook. What do they need to add at the top of a cell to run SQL?
%spark
%%sql
%%pyspark
An analytics engineer needs to replace null values in a discount column with zero using PySpark. Which method should they use?
df.dropna(subset=["discount"])
df.fillna({"discount": 0})
df.filter(col("discount").isNotNull())
A team writes a nightly transformation that replaces all data in a gold-layer table with freshly processed results. Which write mode should they use?
append
overwrite
merge
What does a window function provide that a standard GROUP BY aggregation does not?
It calculates aggregated values while keeping the individual row detail.
It runs faster than GROUP BY on large datasets.
It supports more aggregation functions than GROUP BY.
A table has grown to contain many small Parquet files after weeks of incremental appends. Which command consolidates these files to improve query performance?
VACUUM
OPTIMIZE
ANALYZE TABLE
You must answer all questions before checking your work.
Was this page helpful?
Need help with this topic?
Want to try using Ask Learn to clarify or guide you through this topic?