transcribe real time during twilio phone call?

Question

transcribe real time during twilio phone call?

Rakesh Indla 5

Hello,

I'm able to make a call from twilio, once the call ends I'm passing .wav file to azure Speech To Text, I feel it's taking a lot of time transcribing data. Is there anyway during phone call itself we can transcribe or any other approach we can speedup transcription faster, Please suggest the better method and share reference/sample code.

YutongTie-MSFT 53,971 Reputation points Moderator

2023-10-14T23:01:49+00:00

@Rakesh Indla Thanks for reaching out to us again, do you have a chance to check on my answer above? Does it help? Please do let me know if you need anything else.

Regards,

Yutong
Gobillion YC S21 0 Reputation points

2024-04-11T09:17:55.42+00:00

require('colors');

const speechSdk = require('microsoft-cognitiveservices-speech-sdk');

const EventEmitter = require('events');

class TranscriptionService extends EventEmitter {

constructor() {

super();

const subscriptionKey = process.env.AZURE_SUBSCRIPTION_KEY;

const serviceRegion = process.env.AZURE_REGION;

const speechConfig = speechSdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

speechConfig.speechRecognitionLanguage = 'en-US';

const audioFormat = speechSdk.AudioStreamFormat.getWaveFormatPCM(8000, 8, 1);

// Create the push stream we'll use to send audio.

this.pushStream = speechSdk.AudioInputStream.createPushStream(audioFormat);

const audioConfig = speechSdk.AudioConfig.fromStreamInput(this.pushStream);

this.recognizer = new speechSdk.SpeechRecognizer(speechConfig, audioConfig);

this.start();

this.setupEventHandlers();

}

setupEventHandlers() {

this.recognizer.recognized = (s, e) => {

if (e.result.reason === speechSdk.ResultReason.RecognizedSpeech) {

console.log(RECOGNIZED: Text=${e.result.text}.yellow);

this.emit('transcription', e.result.text.trim());

}

};

this.recognizer.canceled = (s, e) => {

console.error(CANCELED: ${e.reason}.yellow);

if (e.reason === speechSdk.CancellationReason.Error) {

console.error(CANCELED: ErrorCode=${e.errorCode});

console.error(CANCELED: ErrorDetails=${e.errorDetails});

}

};

this.recognizer.sessionStopped = (s, e) => {

console.log('Azure Speech session stopped.');

this.emit('close'); // Custom event to handle closure

};

this.recognizer.canceled = (s, e) => {

console.error(CANCELED: ${e.reason}.yellow);

if (e.reason === speechSdk.CancellationReason.Error) {

console.error(CANCELED: ErrorCode=${e.errorCode});

console.error(CANCELED: ErrorDetails=${e.errorDetails});

this.emit('error', e.errorDetails); // Emit an error event

}

};

}

/**

* Send the payload to Azure Speech

* @param {String} payload A base64 encoded audio stream

*/

send(payload) {

// Convert the base64 string to a Buffer and write it to the push stream.

const buffer = Buffer.from(payload, 'base64');

this.pushStream.write(buffer);

}

start() {

this.recognizer.startContinuousRecognitionAsync();

}

stop() {

this.recognizer.stopContinuousRecognitionAsync();

this.pushStream.close();

}

}

module.exports = { TranscriptionService };

1 answer

Your answer

YutongTie-MSFT 53,971 Reputation points Moderator

2023-10-14T23:01:49+00:00

@Rakesh Indla Thanks for reaching out to us again, do you have a chance to check on my answer above? Does it help? Please do let me know if you need anything else.

Regards,

Yutong

Answer 1

YutongTie-MSFT 53,971 Moderator

Hello @Rakesh Indla

Thanks for reaching out to us, from Azure Speech service point, you can do it by the Real-time diarization (Preview) feature.

You can run an application for speech to text transcription with real-time diarization. Here, diarization is distinguishing between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech.

Please take a look at the document - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=linux&pivots=programming-language-python

I hope this helps, please let me know if you need further assistance.

Regards,

Yutong

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

Rakesh Indla 5 Reputation points

2023-10-16T05:50:06.5566667+00:00

Hello @YutongTie-MSFT

I knew about Real-time diarization, but Its not necessary for us to differentiate the speakers here, I'm worrying about the performance, Let us say, After I end phone call from twilio, I have to send .wav file, which takes, around a minute to transcribe. I want transcription to be faster or Is there a way It can capture phone call conversation in real time and transcribe the data.

Share via

transcribe real time during twilio phone call?

1 answer

Your answer