Details on how Azure OpenAI REST API sends data to the OpenAI servers

johananmahendran 115 Reputation points
2023-10-27T10:43:19.6933333+00:00

Hi,

I am implementing a Retrieval Augmented Generation pattern in which large amount of text is retrieved and fed into a GPT model to reason about the data.

I would like some details on how the OpenAI REST API sends data to the OpenAI servers.

In particular:

  1. How is the prompt fed into the GPT model?

Sometimes the documents retrieved are large and fetching multiple documents exceed the token limit of my model.
Does it automatically break down large prompts into chunks to fit within the token limit of the model? Or is this something I should manage myself? So far it has been giving any issues, but I want to be sure.

  1. In the stream option, it states that it streams back partial progress?

https://learn.microsoft.com/en-us/azure/ai-services/openai/referenceUser's image

When testing it out, I know that it sends sessions output as a stream. However, does it also stream text to the OpenAI server in a similar way? Or does it send everything all at once?

I may have more questions in the future, but these are all I have for now.

Thanks,

Johanan

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,043 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pramod Valavala 20,626 Reputation points Microsoft Employee
    2023-10-27T10:56:32.0566667+00:00

    @johananmahendran Here are the answers to your queries

    How is the prompt fed into the GPT model?

    It is fed as-is passed by you in your response. You are responsible for ensuring the prompt fits within the token limit for the model or use models with higher limits. Most models have a default of 4k tokens but there are 8k and 16k models that you can experiment with as well.

    However, does it also stream text to the OpenAI server in a similar way? Or does it send everything all at once?

    The model requires the complete prompt to start the "completion". So it "must" be sent in full in order to start the response.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.