Azure Container for PyTorch (ACPT)
Azure Container for PyTorch is a lightweight, standalone environment that includes needed components to effectively run optimized training for large models on Azure Machine Learning. The Azure Machine Learning curated environments are available in the user’s workspace by default and are backed by cached Docker images that use the latest version of the Azure Machine Learning SDK. It helps with reducing preparation costs and faster deployment time. ACPT can be used to quickly get started with various deep learning tasks with PyTorch on Azure.
Why should I use ACPT?
- Use as is with preinstalled packages or build on top of the curated environment.
- Optimized training framework to set up, develop, accelerate PyTorch model on large workloads.
- Up-to-date stack with the latest compatible versions of Ubuntu, Python, PyTorch, CUDA\RocM, etc.
- Ease of use: All components installed and validated against dozens of Microsoft workloads to reduce setup costs and accelerate time to value.
- Latest Training Optimization Technologies: ONNX RunTime , DeepSpeed, MSCCL,and others.
- Integration with Azure Machine Learning: Track your PyTorch experiments on Azure Machine Learning studio or using the SDK.
- The image is also available as a Data Science Virtual Machine (DSVM). To learn more about Data Science Virtual Machines, see the DSVM overview documentation.
- Azure customer support reduces training and deployment latency.
- Improves training and deployment success rate.
- Avoid unnecessary image builds.
- Only have required dependencies and access right in the image/container.
To view more information about curated environment packages and versions, visit the Environments tab in the Azure Machine Learning studio.
Supported configurations for Azure Container for PyTorch (ACPT)
Description: The Azure Curated Environment for PyTorch is our latest PyTorch curated environment. It's optimized for large, distributed deep learning workloads and comes prepackaged with the best of Microsoft technologies for accelerated training, for example, OnnxRuntime Training (ORT), DeepSpeed, MSCCL, etc.
The following configurations are supported:
|Environment Name||OS||GPU Version||Python Version||PyTorch Version||ORT-training Version||DeepSpeed Version||torch-ort Version||Nebula Version|
Other packages like fairscale, horovod, msccl, protobuf, pyspark, pytest, pytorch-lightning, tensorboard, NebulaML, torchvision, torchmetrics to support all training needs
To learn more, see Create custom ACPT curated environments.
Version updates for supported environments, including the base images they reference, are released every two weeks to address vulnerabilities no older than 30 days. Based on usage, some environments may be deprecated (hidden from the product but usable) to support more common machine learning scenarios.