To support real-time scenarios, like Virtual Agent and Agent Assist in Call Centers, an integration with the Call Centers telephony system is required.
Typically, the integration with Microsoft Speech Services is handled by a telephony client connected to the customers SIP/RTP processor, for example, to a Session Border Controller (SBC).
Usually the telephony client handles the incoming audio stream from the SIP/RTP processor, the conversion to PCM and connects the streams using continuous recognition. It also triages the processing of the results, for example, analysis of speech transcripts for Agent Assist or connect with a dialog processing engine (for example, Azure Botframework or Power Virtual Agent) for Virtual Agent.
For easier integration the Speech Service also supports “ALAW in WAV container” and “MULAW in WAV container” for audio streaming.
To build this integration we recommend using the Speech SDK.
For guidance on reducing Text to Speech latency check out the How to lower speech synthesis latency guide.
In addition, consider implementing a Text to Speech cache to store all synthesized audio and playback from the cache in case a string has previously been synthesized.