Compute Cluster cache

Question

Compute Cluster cache

Shambhu Rai 1,411

Hi Expert,

How to cache compute cluster in databricks

tried but not working as expected

spark.databricks.io.cache.maxDiskUsage 50g
spark.databricks.io.cache.maxMetaDataCache 1g
spark.databricks.io.cache.compression.enabled false

Accepted answer

0 additional answers

Your answer

Answer 1

I think you are missing some steps :

It looks like you're trying to configure caching for a compute cluster in Azure Databricks. The code snippet you provided appears to be setting some configurations, but they don't look like valid Spark configurations, so that could be why it's not working as expected.

You'll need to create one if you're not already working with a cluster.

Databricks runtime has built-in support for caching, and you can configure it through Spark configurations. Here's an example snippet to set caching configurations using SparkConf:

   val conf = new SparkConf()
     .set("spark.databricks.io.cache.enabled", "true")
     .set("spark.databricks.io.cache.maxDiskUsage", "50g")
     .set("spark.databricks.io.cache.maxMetadataCache", "1g")
     .set("spark.databricks.io.cache.compression.enabled", "false")
   val spark = SparkSession.builder().config(conf).getOrCreate()

You can cache DataFrames in Databricks using the cache() method :

   val df = spark.read.parquet("path/to/your/data")
   df.cache()

You can use the Databricks UI to monitor and manage caching. This allows you to view the cache status, storage levels, and more.

Shambhu Rai 1,411 Reputation points

2023-08-25T05:07:52.2866667+00:00

Where I should write this notebook or cluster configuration

val conf = new SparkConf() .set("spark.databricks.io.cache.enabled", "true") .set("spark.databricks.io.cache.maxDiskUsage", "50g") .set("spark.databricks.io.cache.maxMetadataCache", "1g") .set("spark.databricks.io.cache.compression.enabled", "false") val spark = SparkSession.builder().config(conf).getOrCreate()
Shambhu Rai 1,411 Reputation points

2023-08-25T16:23:18.92+00:00

this is working....any more tuning we can do ....there are lots of data

Share via

Compute Cluster cache

0 additional answers

Your answer