Can I Install R libraries in databricks without reinstalling each time the cluster is turned on?

Marco117 80 Reputation points
2023-11-28T13:53:42+00:00

I need to install R libraries (VIM and arrow), I have tried with the inits script and the global init script, also with the option to install libraries directly in the cluster but every time I turn on the cluster they are reinstalled and the problem is that exactly those two R libraries are taking about 17 min to install:

Using Global Init Scripts:Ingrese la descripción de la imagen aquí

Using cluster libraries

Ingrese la descripción de la imagen aquí

Installation time (17 min):

Ingrese la descripción de la imagen aquí

What can I do to reduce these installation times? Is there any way to install them permanently?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,059 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 84,936 Reputation points Microsoft Employee
    2023-11-29T06:29:36.96+00:00

    @Marco117 - Thanks for the question and using MS Q&A platform.

    This is an excepted behavior that the libraries are reinstalled every time the cluster is turned on because the cluster is ephemeral, meaning it is created and destroyed on-demand. Therefore, any changes made to the cluster are lost when it is destroyed.

    In you want to retain the packages installed on the runtime without waiting time, you need to create a custom image with the required libraries pre-installed. This way, the libraries will be available every time the cluster is created from the custom image.

    For more details, refer to Customize containers with Databricks Container Services and your question is already answered on SO platform: https://stackoverflow.com/questions/77560141/can-i-install-r-libraries-in-databricks-without-reinstalling-each-time-the-clust/77562106

    I hope this helps! Let me know if you have any further questions.


0 additional answers

Sort by: Most helpful