Share via


Large language models (LLMs)

Important

AI Runtime for single-node tasks is in Public Preview. The distributed training API for multi-GPU workloads remain in Beta.

This page provides notebook examples for fine-tuning large language models (LLMs) using AI Runtime. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.

Tutorial Description
Fine-tune Qwen2-0.5B model Efficiently fine-tune the Qwen2-0.5B model using Transformer reinforcement learning (TRL), Liger Kernels for memory-efficient training, and LoRA for parameter-efficient fine-tuning.
Fine-tune Llama-3.2-3B with Unsloth Fine-tune Llama-3.2-3B using the Unsloth library.
Supervised fine-tuning using DeepSpeed and TRL Use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the Transformer Reinforcement Learning (TRL) library with DeepSpeed ZeRO Stage 3 optimization.
LORA fine-tuning using Axolotl Use the Serverless GPU Python API to LORA fine-tune an Olmo3 7B model using the Axolotl library.

Video demo

This video walks through the Fine-tune Llama-3.2-3B with Unsloth example notebook in detail (12 minutes).