low response tokens despite having a high output token limit

S-A 25 Reputation points
2024-01-08T08:21:10.9533333+00:00

I have integrated two Azure Services together: OpenAI and Search AI. The flow of usage is as follows:

  1. The user enters a prompt related to information stored in the index.
  2. The prompt is used as a query in the Search AI Service.
  3. The Search Service retrieves information from the index.
  4. The retrieved information is used in a prompt in the OpenAI Service.
  5. A result is generated to the user.

The results I'm getting are fine and it doesn't hallucinate much. However, the results aren't as descriptive as I want. I have tried a lot of prompt engineering, changing api versions, adjusting temperature and top_p values, but not much has changed. I have set the max_tokens to 2000 but I never got more than 600 tokens. Why is it behaving this way and how can I improve its responses?

Note: I am using the gpt-4-Turbo model, so I have a context window of 120k and an output token limit of 4,096.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,354 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pramod Valavala 20,656 Reputation points Microsoft Employee Moderator
    2024-01-08T18:24:20.1+00:00

    @S-A There are a couple of things you could try but their success would be dependent on your base prompt, context, and user query.

    Considering your results aren't hallucinations in most cases, you have a prompt that emphasizes the need to stick to the context provided, which is pushing the model in that direction.

    If your responses, while brief, still include all the context you are giving it, that itself is a win and getting larger (more descriptive) responses might work by simply asking it to be more elaborate.

    It could be as simple as ending the base prompt with - "Provide a detailed and comprehensive answer to the question using the information below".

    Also, another thing you could try would be increase top_p and max_tokens, though I could not really give you specific numbers to try here.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.