Hello Saurabh M,
Your JSON clearly requests text-only modalities, yet occasional audio markers (<|…|>) appear because the session configuration still includes audio-related settings.
Workaround:
Even though you set "modalities": ["text"], your session.update includes both input_audio_format and output_audio_format. The Realtime API treats the presence of output_audio_format as an implicit request for audio capabilities, causing the service to insert audio tokens.
To enforce text-only behavior, try to eliminate audio-related parameters from your session configuration:
{
type: "session.update",
session: {
"modalities": ["text"],
// Remove these two fields entirely:
// "input_audio_format": "pcm16",
// "output_audio_format": "pcm16",
"voice": null, // Optional: clear voice setting
…
}
}
As you noted: Don’t Use output_modalities Parameter The new output_modalities field is not yet supported in this preview and might return an error. Continue using modalities only.
Please let us know if this helps. If yes, kindly "Accept the answer" and/or upvote, so it will be beneficial to others in the community as well. 😊