Make a call and transcribe in real time

Question

Make a call and transcribe in real time

Rakesh Indla 5

Hello,

My usecase is: Make a call from browser to phone number and once call is connected and start conversation, It should detect and transcribe dual conversation in real time. Is that possible? If yes, Please share code/reference.

YutongTie-MSFT 53,966 Reputation points Moderator

2023-10-14T23:02:05.14+00:00

@Rakesh Indla Thanks for reaching out to us again, do you have a chance to check on my answer above? Does it help? Please do let me know if you need anything else.

Regards,

Yutong
Rakesh Indla 5 Reputation points

2023-10-16T05:53:54.9033333+00:00

Hello @YutongTie-MSFT

I'm not looking for differentiating speakers/voices, I'm simply looking for any azure technology to help calling to phone number from browser and transcribe the entire conversation in realtime.
YutongTie-MSFT 53,966 Reputation points Moderator

2023-10-31T06:35:46.8066667+00:00
@Rakesh Indla

Thanks for your response.

To make a call from a web browser to a phone number and transcribe the conversation in real-time, you can use Azure Communication Services and Azure Cognitive Services. Here's a high-level outline of the steps and technologies involved:

Azure Communication Services:

Use Azure Communication Services to initiate the call from your web application to a phone number. Azure Communication Services provides capabilities for voice calling.

Voice Calling from Browser:

Implement the voice calling functionality in your web application using the Azure Communication Services Web SDK. You'll need to configure the SDK to make outbound calls.

Azure Cognitive Services - Speech Service:

Set up Azure Cognitive Services, specifically the Speech Service, for real-time transcription.

Use the Speech SDK to transcribe the audio from the ongoing call. The Speech Service can convert spoken language into written text.

Real-Time Transcription:

As the call progresses, capture the audio and send it to the Speech Service for real-time transcription.

You can use the WebSocket protocol to establish a connection with the Speech Service for real-time transcription.

Display Transcription:

Display the transcribed text in your web application in real-time so that users can see the conversation as it's transcribed.

Monitoring and Error Handling:

Implement monitoring and error handling to ensure that the transcription process is reliable and to address any issues that may arise during the call.

Here's a simplified example of using the Azure Communication Services Web SDK and Azure Cognitive Services Speech SDK in JavaScript to initiate a call and perform real-time transcription:

// Initialize the Azure Communication Services client const communication = new CommunicationUserClient("<your-communication-service-endpoint>"); const userToken = await communication.createUserAndToken(["voip"]); // Make a call const callClient = new CallClient(); const callAgent = await callClient.createCallAgent(userToken); const phoneNumber = "<phone-number-to-call>"; const call = callAgent.startCall({ participants: [{ phoneNumber }] }); // Initialize the Azure Cognitive Services Speech SDK for transcription const speechConfig = SpeechConfig.fromSubscription("<your-speech-service-subscription-key>", "<your-speech-service-region>"); const audioConfig = AudioConfig.fromDefaultMicrophoneInput(); const recognizer = new SpeechRecognizer(speechConfig, audioConfig); // Real-time transcription recognizer.recognizing = (s, e) => { console.log(`Transcribing: ${e.result.text}`); }; recognizer.recognized = (s, e) => { console.log(`Transcribed: ${e.result.text}`); }; // Start speech recognition recognizer.startContinuousRecognition(); // Handle call events and user interface in your application

Please note that this is a simplified example, and you'll need to integrate it into your web application and add appropriate error handling, user interface, and call management features. Also, ensure that you have the necessary Azure subscriptions and configurations in place for both Azure Communication Services and Azure Cognitive Services.

1 answer

Your answer

YutongTie-MSFT 53,966 Reputation points Moderator

2023-10-14T23:02:05.14+00:00

@Rakesh Indla Thanks for reaching out to us again, do you have a chance to check on my answer above? Does it help? Please do let me know if you need anything else.

Regards,

Yutong
Rakesh Indla 5 Reputation points

2023-10-16T05:53:54.9033333+00:00

Hello @YutongTie-MSFT

I'm not looking for differentiating speakers/voices, I'm simply looking for any azure technology to help calling to phone number from browser and transcribe the entire conversation in realtime.

Answer 1

@Rakesh Indla

Thanks for reaching out to us, from Azure Speech service point, you can do it by the Real-time diarization (Preview) feature.

You can run an application for speech to text transcription with real-time diarization. Here, diarization is distinguishing between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech.

Please take a look at the document - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=linux&pivots=programming-language-python

To make a call from a web browser to a phone number and transcribe the conversation in real-time, you can use Azure Communication Services and Azure Cognitive Services. Here's a high-level outline of the steps and technologies involved:

Azure Communication Services:

Use Azure Communication Services to initiate the call from your web application to a phone number. Azure Communication Services provides capabilities for voice calling.

Voice Calling from Browser:

Implement the voice calling functionality in your web application using the Azure Communication Services Web SDK. You'll need to configure the SDK to make outbound calls.

Azure Cognitive Services - Speech Service:

Set up Azure Cognitive Services, specifically the Speech Service, for real-time transcription.

Use the Speech SDK to transcribe the audio from the ongoing call. The Speech Service can convert spoken language into written text.

Real-Time Transcription:

As the call progresses, capture the audio and send it to the Speech Service for real-time transcription.

You can use the WebSocket protocol to establish a connection with the Speech Service for real-time transcription.

Display Transcription:

Display the transcribed text in your web application in real-time so that users can see the conversation as it's transcribed.

Monitoring and Error Handling:

Implement monitoring and error handling to ensure that the transcription process is reliable and to address any issues that may arise during the call.

Here's a simplified example of using the Azure Communication Services Web SDK and Azure Cognitive Services Speech SDK in JavaScript to initiate a call and perform real-time transcription:

// Initialize the Azure Communication Services client
const communication = new CommunicationUserClient("<your-communication-service-endpoint>");
const userToken = await communication.createUserAndToken(["voip"]);
// Make a call
const callClient = new CallClient();
const callAgent = await callClient.createCallAgent(userToken);
const phoneNumber = "<phone-number-to-call>";
const call = callAgent.startCall({ participants: [{ phoneNumber }] });
// Initialize the Azure Cognitive Services Speech SDK for transcription
const speechConfig = SpeechConfig.fromSubscription("<your-speech-service-subscription-key>", "<your-speech-service-region>");
const audioConfig = AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new SpeechRecognizer(speechConfig, audioConfig);
// Real-time transcription
recognizer.recognizing = (s, e) => {
  console.log(`Transcribing: ${e.result.text}`);
};
recognizer.recognized = (s, e) => {
  console.log(`Transcribed: ${e.result.text}`);
};
// Start speech recognition
recognizer.startContinuousRecognition();
// Handle call events and user interface in your application

Please note that this is a simplified example, and you'll need to integrate it into your web application and add appropriate error handling, user interface, and call management features. Also, ensure that you have the necessary Azure subscriptions and configurations in place for both Azure Communication Services and Azure Cognitive Services.

I hope this helps.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

Make a call and transcribe in real time

1 answer

Your answer