Llama 3.1 serverless deploys limited to size 4096 context window

myat.aung 5 Reputation points
2024-11-12T13:25:27.7133333+00:00

Hi,

We've been testing and using Llama 3.1 on serverless deployments for the past few months. However, it seems that the models no longer support context windows larger than 4096 tokens. I can confirm that this limit exists even when sending a raw HTTPS request, or using the azure.ai.inference SDK on python.

Previously we had no issues with getting completions using larger context windows. Can I please confirm that there has been a change to the serverless deployments? If so, is there a way to work with larger context windows?

Cheers

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,669 questions
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.