cache

Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER).

Syntax

cache()

Returns

DataFrame: Cached DataFrame.

Notes

The default storage level has changed to MEMORY_AND_DISK_DESER to match Scala in 3.0.

Cached data is shared across all Spark sessions on the cluster.

Examples

:::note Serverless compatibility

Databricks recommends moving away from DataFrame.cache() as it is not compatible with Databricks serverless compute architecture. Materialize intermediate results to a Delta table instead.

:::

df = spark.range(1)
df.cache()
# DataFrame[id: bigint]

df.explain()
# == Physical Plan ==
# InMemoryTableScan ...