Deep learning

This article gives a brief introduction to using PyTorch, Tensorflow, and distributed training for developing and fine-tuning deep learning models on Azure Databricks. It also includes links to pages with example notebooks illustrating how to use those tools.

For general guidelines on optimizing deep learning workflows on Azure Databricks, see Best practices for deep learning on Azure Databricks.
For information about working with large language models and generative AI on Azure Databricks, see:
- Build AI agents on Azure Databricks.
- Machine learning on Azure Databricks.

For information and guidance on using serverless GPU with AI Runtime for single and multi-node deep learning workloads, see AI Runtime.

PyTorch

PyTorch is included in Databricks Runtime ML and provides GPU accelerated tensor computation and high-level functionalities for building deep learning networks. You can perform single node training or distributed training with PyTorch on Databricks. See PyTorch. For an end-to-end tutorial notebook using PyTorch and MLflow, see MLflow 3 deep learning workflow.

TensorFlow

Databricks Runtime ML includes TensorFlow and TensorBoard, so you can use these libraries without installing any packages. TensorFlow supports deep-learning and general numerical computations on CPUs, GPUs, and clusters of GPUs. TensorBoard provides visualization tools to help you debug and optimize machine learning and deep learning workflows. See TensorFlow for single node and distributed training examples.

Distributed training

Because deep learning models are data and computation-intensive, distributed training can be important. For examples of distributed deep learning using integrations with Ray, TorchDistributor, and DeepSpeed see Distributed training.

Track deep learning model development

Tracking remains a cornerstone of the MLflow ecosystem and is especially vital for the iterative nature of deep learning. Databricks uses MLflow to track deep learning training runs and model development. See Track model development using MLflow.

Feedback

Was this page helpful?

Last updated on 2026-06-02