Why can I only deploy Falcon-7b on VM with minimum 440GB RAM?

Casey 65 Reputation points
2023-07-19T02:40:23.47+00:00

According to the Hugging Face model card, Falcon-7b requires at least 16GB of RAM, suggesting that it can run on 16GB (or close to it).

But when I try to deploy an endpoint for it in the ML studio, the smallest available VM has 440 GB RAM, which seems excessive and unnecessarily expensive. I also don't have a quote for a VM of this size so it's preventing me from deploying a model that in theory I should be able to run on a much smaller VM.

How come these are the only available VM sizes?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,332 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator
    2023-07-19T15:43:23.1333333+00:00

    @Casey Thanks for adding the screen shot. I think in this case it is because of the A100 GPU requirement to run the model. As you might have observed, the blog from tech community mentions the same.

    These models need Nvidia A100 GPUs to run. You will need quota for one of the following Azure VM instance types that have the A100 GPU: "Standard_NC48ads_A100_v4", "Standard_NC96ads_A100_v4", "Standard_ND96asr_v4" or "Standard_ND96amsr_A100_v4".

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.