Distributed LLM batch inference

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

This page provides notebook examples for LLM batch inference using Ray Data, a scalable data processing library for AI workloads, on serverless GPU compute.

Tutorial	Description
Batch inference using vLLM with Ray Data	This notebook demonstrates how to run LLM inference at scale using Ray Data and vLLM on serverless GPU. It leverages the distributed serverless GPU API to automatically provision and manage multi-node A10 GPUs for distributed inference.
Batch inference using SGLang with Ray Data	SGLang is a high-performance serving framework for LLMs. This notebook demonstrates how to run LLM batch inference using SGLang and Ray Data on Databricks serverless GPU.

Feedback

Nakatulong ba ang pahinang ito?

Last updated on 2026-02-25

Ibahagi sa

Distributed LLM batch inference

Feedback

Mga karagdagang mapagkukunan