Relation between max_prompt_token and tokens per minute quota

Mohamed Hussein 300 Reputation points
2024-11-25T17:31:07.83+00:00

I'm using Azure OpenAI assistants and oftenly gets this error max_prompt_token at file generations, the reason is quite not clear..

I've requested to increase the tokens per minute quota. Current quota is 900K per minute

user prompt was:

Can you export the 164 messages we have exchanged


  1. Is that doable to ask the assistant to export conversation messages to jsonl or csv file?
  2. Is there anywhere to get a closer look on error details?

{
    "uri": "https://huswabagpt.openai.azure.com/openai/threads/thread_7j2hL1FVVVLQfi9plDS714to/messages?api-version=2024-05-01-preview",
    "method": "POST",
    "headers": {
        "api-key": "xx",
        "Content-Type": "application/json"
    },
    "body": {
        "role": "user",
        "content": "Can you export the 165 messages we have exchanged at .csv file?"
    }
}

{
  "id": "run_8YFlC57xljQtlEgqvINjNLoY",
  "object": "thread.run",
  "created_at": 1732552380,
  "assistant_id": "asst_3rY50WWj2cvOXU5jjealB7qy",
  "thread_id": "thread_7j2hL1FVVVLQfi9plDS714to",
  "status": "incomplete",
  "started_at": 1732552381,
  "expires_at": null,
  "cancelled_at": null,
  "failed_at": null,
  "completed_at": 1732552689,
  "required_action": null,
  "last_error": null,
  "model": "gpt-4o",
  "instructions": "You are an AI assistant",
  "tools": [
    {
      "type": "code_interpreter"
    },
    {
      "type": "file_search",
      "file_search": {
        "ranking_options": {
          "ranker": "default_2024_08_21",
          "score_threshold": 0
        }
      }
    }
  ],
  "tool_resources": {},
  "metadata": {},
  "temperature": 1,
  "top_p": 0.94,
  "max_completion_tokens": null,
  "max_prompt_tokens": null,
  "truncation_strategy": {
    "type": "auto",
    "last_messages": null
  },
  "incomplete_details": {
    "reason": "max_prompt_tokens"
  },
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 5000,
    "total_tokens": 5000
  },
  "response_format": "auto",
  "tool_choice": "auto",
  "parallel_tool_calls": true
}

Image 6

Image 5

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,377 questions
{count} votes

Accepted answer
  1. Gabriel Santana 90 Reputation points
    2024-11-25T19:43:57.7166667+00:00
    1. Is it doable to ask the assistant to export conversation messages to JSONL or CSV?

    Yes, it’s doable, but there’s a catch—tokens. A "token" isn’t the same as a word or character. It’s like chunks of text, and even simple words can split into multiple tokens depending on the language or formatting. Models like GPT-4 have a max token limit per request (e.g., 8,192 or 32,768 tokens), which includes everything—the system setup, your messages, and the assistant's responses.

    So, exporting all 164 messages at once might hit that limit. To handle this:

    • Break it down: Ask for smaller chunks (like 50 messages at a time). Then, combine them into a JSONL or CSV later using a script or tool.
        import json
        import requests
        # Azure OpenAI API details
        AZURE_OPENAI_ENDPOINT = "https://your-endpoint.openai.azure.com/"
        API_VERSION = "2024-05-01-preview"
        API_KEY = "your-api-key"
        THREAD_ID = "thread_id_here"  # Replace with the specific thread ID
        # Configuration
        OUTPUT_FILE = "conversation.jsonl"  # Output file name
        LAST_N_MESSAGES = 50  # Number of last messages to export
        
        def fetch_messages(thread_id, last_n):
          """
          Fetch the last N messages from a thread using Azure OpenAI API.
          """
          url = f"{AZURE_OPENAI_ENDPOINT}openai/threads/{thread_id}/messages"
          headers = {
              "api-key": API_KEY,
              "Content-Type": "application/json"
          }
          params = {
              "api-version": API_VERSION
          }
          
          response = requests.get(url, headers=headers, params=params)
          if response.status_code != 200:
              print(f"Error fetching messages: {response.status_code} - {response.text}")
              return None
          
          messages = response.json()
          # Extract the last N messages
          return messages["messages"][-last_n:]
        def export_to_jsonl(messages, output_file):
          """
          Save messages to a JSONL file .
          """
          with open(output_file, "w") as f:
              for message in messages:
                  json.dump(message, f)
                  f.write("\n")
          print(f"Exported {len(messages)} messages to {output_file}")
        def main():
          # Fetch the last N messages
          messages = fetch_messages(THREAD_ID, LAST_N_MESSAGES)
          if not messages:
              print("No messages to export.")
              return
          
          # Export messages to JSONL
          export_to_jsonl(messages, OUTPUT_FILE)
        if __name__ == "__main__":
          main()
        
      
    • Use a simpler format: If you just grab the raw data and format it yourself (in your script), you’ll save a lot of tokens, specially in the JSONL format.
    1. Is there anywhere to get a closer look on error details?

    The API gives you lots of clues when something goes wrong:

    • incomplete_details.reason will tell you why the task failed—like in your case, max_prompt_tokens.
    • usage.prompt_tokens shows how many tokens your input used.
    • total_tokens is the total used (input + output). If it’s above the limit, the request won’t work.

    These, you can check to adjust your request to fit within the limits. Tokens might sound technical, but think of it this way: it is just how the system counts chunks of text, not letters or even words or characters. Feel free to ask for further breakdowns if you need help!🫡

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.