Hi GenixPRO,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!
I understand that you are facing an issue with the gpt-4o-realtime API
where system messages exceeding a certain word count (~6500 words) fail to initiate a conversation, and you're also exploring if the API can handle video.
The gpt-4o-realtime API
has a token limit of 8,192 or 32,768 tokens, which includes the system message, user inputs, and model responses. If your system message exceeds this limit, the API may silently fail. For a system message exceeding ~6500 words (likely over 32,000 tokens), consider reducing its length or splitting the conversation into smaller parts. Use tools like OpenAI's Tokenizer to calculate token usage and ensure you're within limits.
Regarding video, the API does not support direct video processing. However, you can use Azure services like Video Indexer to extract metadata and transcription or Speech Services for audio-to-text conversion, then pass the processed text to the API. For intermittent audio issues, ensure you log errors, verify rate limits, and test in different Azure regions to troubleshoot latency or service degradation. Let us know if you need further assistance!
Hope this helps. Do let us know if you have any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.