Token Rate Limit Exceeded

Data Juggler 181 Reputation points
2024-08-11T15:02:10.3266667+00:00

I wrote a quick demo program to test out Azure Chat Assistant.

In order to have rounds of conversations, such as writing a story where there are edits, the examples show you add each message to the conversation, so there can be understanding between prompts. This quickly adds to the size.

After 4 or 5 edits, I ran into a Token Limit Exceeded. Sorry I didn't capture the exact message, but it was 3 AM so I just went to bed.

I also gave pretty detailed instructions to the assistant at startup for how the radio drama was supposed to be created. First an outline of the scenes was to be created and edited, and then once the story board is approved, then the dialog for each character and / or narrator and related image prompts and sound effects descriptions were included. Before we finished the Outline phase, I ran into the Token Limit exceeded with a link to request more. The first time it said wait 40 seconds and try again. I waited 5 minutes and then it still failed, and it said wait 50 seconds, I gave up.

My question pertains to do is there a more efficient way to have a continuing conversation then sending back the entire message chain? I am one user and I exceeded my rate limit in 4 or 5 rounds. How would this ever scale to be a website with lots of users?

Is there any way to measure the tokens used in each session?

A short code snippet is listed below.

Thanks for any guidance, as Tokens are very mysterious. To me, conversations should be measured in innings, not number of characters used, or your rate limits are not useable for anything.

// Get the endpoint and key

string endpoint = EnvironmentVariableHelper.GetEnvironmentVariableValue("AZURE_OPENAI_ENDPOINT", EnvironmentVariableTarget.User);

string key = EnvironmentVariableHelper.GetEnvironmentVariableValue("AZURE_OPENAI_KEY", EnvironmentVariableTarget.User);



// Create a new instance of an 'OpenAIClient' object.

var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(key));

if (!HasMessages)

{

    ResultsTextBox.Text = "New Session" + Environment.NewLine;

    

    Messages = new List<AssistantChatMessage>()

    {

        new AssistantChatMessage("You are an AI Assistant Designed To Create Old Style Radio Dramas,  although some may have a modern settings. \r\nThe stories have two main characters, Gene Armstrong, private detective, former police detective turned private investigator because the money is better, sometimes. The other character is a female, her name is Lauren Adams. Lauren is about 10 years younger than Gene, and although she is 100 percent professional at work, she secretly has a crush on Gene. Gene may like her back, but can't risk the temptation of a romantic interlude, due to being her boss. Gene is smart, and as ex Marine, quite capable of defending herself. Lauren has the computer skills that Gene lacks, which she uses in todays digital world. Investigations require both of their skills. Some of their case are insurance and fraud like many detective agencies. Other cases involve corporate espionage of the most sensitive nature while some cases are classified for the government. Before writing the script, we need you to create an outline of the plot and let us tweak the story outline. Once the outline is approved, then the script can be written. When you write the story, each scene needs to list the characters in the scene at the top. Also, an image prompt is needed to describe the prompt, and any sound effects needed."),

        new AssistantChatMessage(PromptTextBox.Text)

    };

}

else

{

    // Create a message

    AssistantChatMessage message = new AssistantChatMessage(PromptTextBox.Text) { ParticipantName = "Mark" };

    

    messages.Add(message);

}



ClientResult<ChatCompletion> result = await client.GetChatClient(deploymentName).CompleteChatAsync(messages);



// Get the result

result = await client.GetChatClient(deploymentName).CompleteChatAsync(messages);



Messages.Add(new AssistantChatMessage(result));



ResultsTextBox.Text = "";



if (ListHelper.HasOneOrMoreItems(messages))

{

    AssistantChatMessage message = messages[messages.Count - 1];

    

    string role = message.ParticipantName;

    string text = role + ": " + message.Content[0].Text + Environment.NewLine + Environment.NewLine;

    

    // Display

    ResultsTextBox.Text += text;

}
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,098 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,550 Reputation points Microsoft Employee Moderator
    2024-08-12T06:16:19.73+00:00

    @Data Juggler Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    .

    There are a few adjustments you can make to manage token usage more efficiently and avoid hitting the token limit. Here are some suggestions:

    1. Summarize Previous Messages

    Instead of sending the entire message history, you can summarize previous messages to reduce the token count. This helps maintain context without exceeding limits.

    1. Use System Messages Wisely

    Place detailed instructions in a system message at the start of the conversation. This way, you don’t need to repeat them in every user message.

    1. Trim Unnecessary Details

    Remove any redundant or less critical information from the conversation history.

    1. Token Counting

    You can first test the average token being used from the Azure Portal metrics or you can also use the Azure AI Studio.

    Metrics:
    User's image

    Azure Open AI Studio:

    User's image

    .

    .

    Here are some adjustments to your code to help manage token usage:

    // Get the endpoint and key
    
    string endpoint = "ENDPOINT HERE";
    
    string key = "KEY HERE";
    
    // Create a new instance of an 'OpenAIClient' object.
    
    var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(key));
    
    if (!HasMessages)
    
    {
    
        ResultsTextBox.Text = "New Session" + Environment.NewLine;
    
        Messages = new List<AssistantChatMessage>()
    
        {
    
            new AssistantChatMessage("You are an AI Assistant Designed To Create Old Style Radio Dramas,  although some may have a modern settings. \r\nThe stories have two main characters, Gene Armstrong, private detective, former police detective turned private investigator because the money is better, sometimes. The other character is a female, her name is Lauren Adams. Lauren is about 10 years younger than Gene, and although she is 100 percent professional at work, she secretly has a crush on Gene. Gene may like her back, but can't risk the temptation of a romantic interlude, due to being her boss. Gene is smart, and as ex Marine, quite capable of defending herself. Lauren has the computer skills that Gene lacks, which she uses in todays digital world. Investigations require both of their skills. Some of their case are insurance and fraud like many detective agencies. Other cases involve corporate espionage of the most sensitive nature while some cases are classified for the government. Before writing the script, we need you to create an outline of the plot and let us tweak the story outline. Once the outline is approved, then the script can be written. When you write the story, each scene needs to list the characters in the scene at the top. Also, an image prompt is needed to describe the prompt, and any sound effects needed."),
    
            new AssistantChatMessage(PromptTextBox.Text)
    
        };
    
    }
    
    else
    
    {
    
        // Summarize previous messages to reduce token count
    
        string summary = SummarizeMessages(Messages);
    
        Messages = new List<AssistantChatMessage>()
    
        {
    
            new AssistantChatMessage(summary),
    
            new AssistantChatMessage(PromptTextBox.Text) { ParticipantName = "Mark" }
    
        };
    
    }
    
    ClientResult<ChatCompletion> result = await client.GetChatClient(deploymentName).CompleteChatAsync(Messages);
    
    // Get the result
    
    Messages.Add(new AssistantChatMessage(result));
    
    ResultsTextBox.Text = "";
    
    if (ListHelper.HasOneOrMoreItems(Messages))
    
    {
    
        AssistantChatMessage message = Messages[Messages.Count - 1];
    
        string role = message.ParticipantName;
    
        string text = role + ": " + message.Content[0].Text + Environment.NewLine + Environment.NewLine;
    
        // Display
    
        ResultsTextBox.Text += text;
    
    }
    
    // Function to summarize previous messages
    
    string SummarizeMessages(List<AssistantChatMessage> messages)
    
    {
    
        // Implement your summarization logic here
    
        // For example, you can concatenate the last few messages or create a brief summary
    
        return string.Join(" ", messages.Select(m => m.Content[0].Text).TakeLast(3));
    
    }
    

    .

    .

    On a side note:

    GPT-4o model version 2024-08-06. GPT-4o 2024-08-06 has all the capabilities of the previous version as well as:

    • An enhanced ability to support complex structured outputs.
    • Max output tokens have been increased from 4,096 to 16,384.
    • Supported regions East US, East US2, Sweden Central, West US and West US 3.

    .

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.