Llama 3.1 serverless deploys limited to size 4096 context window
myat.aung
5
Reputation points
Hi,
We've been testing and using Llama 3.1 on serverless deployments for the past few months. However, it seems that the models no longer support context windows larger than 4096 tokens. I can confirm that this limit exists even when sending a raw HTTPS request, or using the azure.ai.inference SDK on python.
Previously we had no issues with getting completions using larger context windows. Can I please confirm that there has been a change to the serverless deployments? If so, is there a way to work with larger context windows?
Cheers
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,667 questions
Sign in to answer