Nota
L-aċċess għal din il-paġna jeħtieġ l-awtorizzazzjoni. Tista’ tipprova tidħol jew tibdel id-direttorji.
L-aċċess għal din il-paġna jeħtieġ l-awtorizzazzjoni. Tista’ tipprova tibdel id-direttorji.
This article aims to provide clear and opinionated guidance for compute creation. By using the right compute types for your workflow, you can improve performance and save on costs.
| Best Practice | Impact | Docs |
|---|---|---|
| If you are new to Azure Databricks, start by using general all-purpose instance types | Selecting the appropriate instance type for the workload results in higher efficiency. | |
| Use standard access mode unless your required functionality isn't supported | Compute with standard access mode can be used by multiple users with data isolation among users. | |
| Use the latest generation instance types if there is enough availability | The latest generation of instance types provide the best performance and latest features. | |
| Set your on-demand and spot-instance balance based on how quickly you need your workload to run | Spot instances save on cost but can affect the overall run time of an operation if the spot instances are reclaimed. | |
| Choose the size of your nodes and the number of workers based on the types of operations your workload performs | For example, if you expect a lot of shuffles, it can be more efficient to use a large single node instead of multiple smaller nodes. | |
| Run vacuum on a cluster with auto-scaling set for 1-4 workers, where each worker has 8 cores. Select a driver with between 8 and 32 cores. Increase the size of the driver if you get out-of-memory (OOM) errors. |
Vacuum statements happen in two phases, the second of which is driver-heavy. If you don't use the right-sized cluster, the operation could cause a slowdown and might not succeed. | |
| Assess whether your batch workflow would benefit from Photon | Photon provides faster queries and reduces your total cost per workload. |