Deployment on Azure ML with Tensor Parallelism Fails – IndexError & Initialization Issues

Sebastian Buzdugan 40 Reputation points
2024-12-16T14:04:47.18+00:00

Hello,

I am facing deployment issues with my fine-tuned model on Azure ML using DeepSpeed and vLLM. Despite configuring tensor parallelism, the deployment fails when calling the endpoint.

Deployment Details

  • Model: Auto-deployed through Azure ML (MLFlow format) - FineTuned Phi3.5MoE
  • Inference Engine: vLLM (auto-configured).
  • VM SKU: Standard_NC24ads_A100_v4.
  • Instance Count: 3 for GPU parallelism.

Errors in Logs

  1. Main Error:
IndexError: list index out of range
File "/azureml-envs/default/lib/python3.10/site-packages/llm/optimized/inference/replica_manager.py", line 215, in get_replica
    replica = self.engine_replicas[self._replica_index]
  1. Replica Manager Initialization:
2024-12-16 13:21:00,587 [replica_manager] initialize 136: INFO Lock acquired by worker with pid: 7. Loading model. Using tensor parallel of 2 GPUs per replica.
2024-12-16 13:21:00,974 [replica_manager] initialize 168: INFO 0
Initialized 0 replicas.
  1. Warnings:

• async_io requires the dev libaio .so object and headers but these were not found.

• sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3.

• using untested triton version (2.3.1), only 1.0.0 is known to be compatible.

Questions

  1. Why does the replica manager fail to initialize replicas (Initialized 0 replicas)?
  2. How can I resolve the list index out of range issue in the inference server?
  3. Are the warnings related to missing libaio headers, the Torch version mismatch, or Triton compatibility causing this failure?
  4. Since the deployment was auto-configured, is there a way to adjust configuration files (DeepSpeed, vLLM) without needing custom Python modifications?
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,340 questions
0 comments No comments
{count} votes

Accepted answer
  1. Azar 29,520 Reputation points MVP Volunteer Moderator
    2024-12-16T14:20:45.5266667+00:00

    Hi there Sebastian Buzdugan

    Thanks for using QandA platform

    Thiss maybe due to mismatched tensor parallelism settings or missing dependencies. The list index out of range error suggests issues with tensor slicing or incomplete model shards. Compatibility warnings about PyTorch, Triton, and missing libaio could also disrupt initialization, as vLLM and DeepSpeed require specific versions.

    Try, updating configuration files eg., DeepSpeed, vLLM to align tensor parallelism with the 3 GPUs, specify compatible PyTorch (e.g., 1.14) and Triton (e.g., 1.0.0), and install libaio. You can extract and customize these settings from deployment logs without reprogramming.\

    If this helps kindly accept the answer thanks much.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.