Details on how Azure OpenAI REST API sends data to the OpenAI servers

johananmahendran 60 Reputation points
2023-10-27T10:43:19.6933333+00:00

Hi,

I am implementing a Retrieval Augmented Generation pattern in which large amount of text is retrieved and fed into a GPT model to reason about the data.

I would like some details on how the OpenAI REST API sends data to the OpenAI servers.

In particular:

  1. How is the prompt fed into the GPT model?

Sometimes the documents retrieved are large and fetching multiple documents exceed the token limit of my model.
Does it automatically break down large prompts into chunks to fit within the token limit of the model? Or is this something I should manage myself? So far it has been giving any issues, but I want to be sure.

  1. In the stream option, it states that it streams back partial progress?

https://learn.microsoft.com/en-us/azure/ai-services/openai/referenceUser's image

When testing it out, I know that it sends sessions output as a stream. However, does it also stream text to the OpenAI server in a similar way? Or does it send everything all at once?

I may have more questions in the future, but these are all I have for now.

Thanks,

Johanan

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
1,423 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pramod Valavala 19,356 Reputation points Microsoft Employee
    2023-10-27T10:56:32.0566667+00:00

    @johananmahendran Here are the answers to your queries

    How is the prompt fed into the GPT model?

    It is fed as-is passed by you in your response. You are responsible for ensuring the prompt fits within the token limit for the model or use models with higher limits. Most models have a default of 4k tokens but there are 8k and 16k models that you can experiment with as well.

    However, does it also stream text to the OpenAI server in a similar way? Or does it send everything all at once?

    The model requires the complete prompt to start the "completion". So it "must" be sent in full in order to start the response.

    0 comments No comments