Manage and optimize Azure Machine Learning costs
Learn how to manage and optimize costs when training and deploying machine learning models to Azure Machine Learning.
Use the following tips to help you manage and optimize your compute resource costs.
- Configure your training clusters for autoscaling
- Set quotas on your subscription and workspaces
- Set termination policies on your training job
- Use low-priority virtual machines (VM)
- Schedule compute instances to shut down and start up automatically
- Use an Azure Reserved VM Instance
- Train locally
- Parallelize training
- Set data retention and deletion policies
- Deploy resources to the same region
For information on planning and monitoring costs, see the plan to manage costs for Azure Machine Learning guide.
Use Azure Machine Learning compute cluster (AmlCompute)
With constantly changing data, you need fast and streamlined model training and retraining to maintain accurate models. However, continuous training comes at a cost, especially for deep learning models on GPUs.
Azure Machine Learning users can use the managed Azure Machine Learning compute cluster, also called AmlCompute. AmlCompute supports a variety of GPU and CPU options. The AmlCompute is internally hosted on behalf of your subscription by Azure Machine Learning. It provides the same enterprise grade security, compliance and governance at Azure IaaS cloud scale.
Because these compute pools are inside of Azure's IaaS infrastructure, you can deploy, scale, and manage your training with the same security and compliance requirements as the rest of your infrastructure. These deployments occur in your subscription and obey your governance rules. Learn more about Azure Machine Learning compute.
Configure training clusters for autoscaling
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each job completes, the cluster will release nodes and scale to your configured minimum node count.
To avoid charges when no jobs are running, set the minimum nodes to 0. This setting allows Azure Machine Learning to de-allocate the nodes when they aren't in use. Any value larger than 0 will keep that number of nodes running, even if they are not in use.
You can also configure the amount of time the node is idle before scale down. By default, idle time before scale down is set to 120 seconds.
- If you perform less iterative experimentation, reduce this time to save costs.
- If you perform highly iterative dev/test experimentation, you might need to increase the time so you aren't paying for constant scaling up and down after each change to your training script or environment.
Set quotas on resources
AmlCompute comes with a quota (or limit) configuration. This quota is by VM family (for example, Dv2 series, NCv3 series) and varies by region for each subscription. Subscriptions start with small defaults to get you going, but use this setting to control the amount of Amlcompute resources available to be spun up in your subscription.
Also configure workspace level quota by VM family, for each workspace within a subscription. Doing so allows you to have more granular control on the costs that each workspace might potentially incur and restrict certain VM families.
To set quotas at the workspace level, start in the Azure portal. Select any workspace in your subscription, and select Usages + quotas in the left pane. Then select the Configure quotas tab to view the quotas. You need privileges at the subscription scope to set the quota, since it's a setting that affects multiple workspaces.
Set job autotermination policies
In some cases, you should configure your training runs to limit their duration or terminate them early. For example, when you are using Azure Machine Learning's built-in hyperparameter tuning or automated machine learning.
Here are a few options that you have:
- Define a parameter called
max_run_duration_secondsin your RunConfiguration to control the maximum duration a run can extend to on the compute you choose (either local or remote cloud compute).
- For hyperparameter tuning, define an early termination policy from a Bandit policy, a Median stopping policy, or a Truncation selection policy. To further control hyperparameter sweeps, use parameters such as
- For automated machine learning, set similar termination policies using the
enable_early_stoppingflag. Also use properties such as
experiment_timeout_minutesto control the maximum duration of a job or for the entire experiment.
Use low-priority VMs
Azure allows you to use excess unutilized capacity as Low-Priority VMs across virtual machine scale sets, Batch, and the Machine Learning service. These allocations are pre-emptible but come at a reduced price compared to dedicated VMs. In general, we recommend using Low-Priority VMs for Batch workloads. You should also use them where interruptions are recoverable either through resubmits (for Batch Inferencing) or through restarts (for deep learning training with checkpointing).
Low-Priority VMs have a single quota separate from the dedicated quota value, which is by VM family. Learn more about AmlCompute quotas.
Low-Priority VMs don't work for compute instances, since they need to support interactive notebook experiences.
Schedule compute instances
When you create a compute instance, the VM stays on so it is available for your work. Set up a schedule to automatically start and stop the compute instance (preview) to save cost when you aren't planning to use it.
Use reserved instances
Another way to save money on compute resources is Azure Reserved VM Instance. With this offering, you commit to one-year or three-year terms. These discounts range up to 72% of the pay-as-you-go prices and are applied directly to your monthly Azure bill.
Azure Machine Learning Compute supports reserved instances inherently. If you purchase a one-year or three-year reserved instance, we will automatically apply discount against your Azure Machine Learning managed compute.
When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to
local executes your script locally.
Visual Studio Code provides a full-featured environment for developing your machine learning applications. Using the Azure Machine Learning visual Visual Studio Code extension and Docker, you can run and debug locally. For more information, see interactive debugging with Visual Studio Code.
One of the key methods of optimizing cost and performance is by parallelizing the workload with the help of a parallel run step in Azure Machine Learning. This step allows you to use many smaller nodes to execute the task in parallel, hence allowing you to scale horizontally. There is an overhead for parallelization. Depending on the workload and the degree of parallelism that can be achieved, this may or may not be an option. For further information, see the ParallelRunStep documentation.
Set data retention & deletion policies
Every time a pipeline is executed, intermediate datasets are generated at each step. Over time, these intermediate datasets take up space in your storage account. Consider setting up policies to manage your data throughout its lifecycle to archive and delete your datasets. For more information, see optimize costs by automating Azure Blob Storage access tiers.
Deploy resources to the same region
Computes located in different regions may experience network latency and increased data transfer costs. Azure network costs are incurred from outbound bandwidth from Azure data centers. To help reduce network costs, deploy all your resources in the region. Provisioning your Azure Machine Learning workspace and dependent resources in the same region as your data can help lower cost and improve performance.
For hybrid cloud scenarios like those using ExpressRoute, it can sometimes be more cost effective to move all resources to Azure to optimize network costs and latency.
Submit and view feedback for