Azure OpenAI Service gpt-35-turbo model returns over 4096 tokens.

Yuta Kuroda 10 Reputation points

I am using the gpt-35-turbo model with Azure OpenAI Service. The documentation(1) states that the max request tokens are 4096, but it seems to be returning more than 4096 tokens. Can anyone explain why this is happening?


the api response is below. The finish_reason is "stop". I expect it is "length".
I delete the id and the content.

  "id": "{it's deleted}",
  "object": "chat.completion",
  "created": 1682590906,
  "model": "gpt-35-turbo",
  "usage": {
    "prompt_tokens": 4378,
    "completion_tokens": 2359,
    "total_tokens": 6737
  "choices": [
      "message": {
        "role": "assistant",
        "content": "{it's so long... deleted}"
      "finish_reason": "stop",
      "index": 0
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
1,463 questions
{count} votes

1 answer

Sort by: Most helpful
  1. John Sanders 171 Reputation points Microsoft Employee
    • We always recommend staying within the documented token limit
    • While we don't intend to change the behavior of version 0301 of the model, all future versions will only support 4k tokens.
    0 comments No comments