This article aims to provide clear and opinionated guidance for compute creation. By using the right compute types for your workflow, you can improve performance and save on costs.
Best Practice
Impact
Docs
If you are new to Azure Databricks, start by using general all-purpose instance types
Selecting the appropriate instance type for the workload results in higher efficiency.
Run vacuum on a cluster with auto-scaling set for 1-4 workers, where each worker has 8 cores.
Select a driver with between 8 and 32 cores. Increase the size of the driver if you get out-of-memory (OOM) errors.
Vacuum statements happen in two phases, the second of which is driver-heavy. If you don’t use the right-sized cluster, the operation could cause a slowdown and might not succeed.