Inconsistent Latency Discrepancy in Text to Speech Synthesizer Between Local and Production Environments

Question

Inconsistent Latency Discrepancy in Text to Speech Synthesizer Between Local and Production Environments

Darshan Gupta 0

I'm encountering a discrepancy in the performance of the Text to Speech synthesizer. When I use speakTextAsync locally, the 'first byte latency' and 'finish latency' are consistently less than 200 ms for texts of varying lengths. However, when the same function is invoked in the production environment, I observe a latency ranging from 600-800 ms (using same text used in local env). Could you please help me understand the reasons behind this difference?

Code:

async getTextToSpeech(ctx, next){
        const requestBody = ctx.request.body;
        const speechKey = process.env.SPEECH_KEY;
        const speechRegion = process.env.SPEECH_REGION;

        let response;
        response = new RESPONSE_MESSAGE.GenericSuccessMessage();
        if (!speechKey || !speechRegion) {
            console.log('Please set the environment variables SPEECH_KEY and SPEECH_REGION');
            process.exit(1);
        }

        const text = requestBody.speechText

        const speechConfig = sdk.SpeechConfig.fromSubscription(speechKey, speechRegion);
        speechConfig.speechSynthesisVoiceName = 'hi-IN-MadhurNeural';
        speechConfig.speechSynthesisLanguage = 'hi-IN';
        speechConfig.speechSynthesisOutputFormat = sdk.SpeechSynthesisOutputFormat.Riff8Khz16BitMonoPcm;

        const pullStream = sdk.AudioOutputStream.createPullStream();
        const audioConfig = sdk.AudioConfig.fromStreamOutput(pullStream);
        const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

        const outputFilePath = 'tts_output.wav';
        const outputFileStream = fs.createWriteStream(outputFilePath);
        outputFileStream.on("error", err => console.log(err));

        await new Promise((resolve, reject) => {
            synthesizer.speakTextAsync(text, (result) => {
                if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
                    outputFileStream.write(Buffer.from(result.audioData));
                    console.log(`TTS audio saved to: ${outputFilePath}`);
                    resolve();
                } else {
                    console.log("Error");
                    reject(new Error(`Speech synthesis failed: ${result.errorDetails}`));
                }
            });
        });
        return outputFilePath;
    }

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-02-05T04:46:40.66+00:00

@Darshan Gupta Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

1 answer

Your answer

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-02-05T04:46:40.66+00:00

@Darshan Gupta Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

Answer 1

@Darshan Gupta Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

There are multiple factors that could contribute to the latency.

Network Latency: The network latency in the production environment could be higher than your local environment. This could be due to the physical distance between the server and the Azure region, or due to network congestion. Please check where your production application is hosted. Can you keep it in same region as that of the Speech resource ?
Concurrency: The Azure AI Speech service has the ability to autoscale, but it takes time to scale out. If the concurrency is increased in a short time, the client may experience longer latency or even receive a 429 error code (too many requests). So, we recommend you increase your concurrency step by step in load test. See this article for more details, especially this example of workload patterns.
Recommendations: The recommendations to lower latency is mentioned in this article. Please follow this.

.

On a side note:

You can also check the latency metrics and identify which operation is taking time and compare it with your prod and local dev environment. User's image

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

Share via

Inconsistent Latency Discrepancy in Text to Speech Synthesizer Between Local and Production Environments

1 answer

Your answer