Azure Speech to text service does not work for some audio files.

Jigar Shah 0 Reputation points
2023-03-17T09:29:18.32+00:00

Azure Speech to text service does not identify the multi user conversation (tested for two speakers).

It mixes the statements of both the speakers in same sentence.

It then duplicates the documents as mentioned below:

Phrase: Hello, Amy, I'm doctor Jones. How are you doing today?

Speaker: 0

Sentiment: neutral

Phrase: Hello, Amy, I'm doctor Jones. How are you doing today?

Speaker: 1

Sentiment: neutral

Phrase: I'm OK, but it hurts when I go to the bathroom when I pee.

Speaker: 0

Sentiment: negative

Phrase: I'm OK, but it hurts when I go to the bathroom but I pee.

Speaker: 1

Sentiment: negative

Phrase: That's called this year and it's pretty common. When did this start? Two days ago.

Speaker: 0

Sentiment: neutral

Phrase: That's called this year and it's pretty common. When did this start? Two days ago.

Speaker: 1

Sentiment: neutral

Phrase: Have you had this before? Yes. I had this several years ago, before you were my doctor.

Speaker: 0

Sentiment: neutral

Phrase: Have you had this before? Yes. I had this several years ago, before you were my doctor.

Speaker: 1

Sentiment: neutral

...

So it joints the statements from both speakers together instead of recognizing them separately.

I used below url:

--languageKey <<LanguageKey>> --languageEndpoint <<languageEndPoint>> --speechKey <<Speechkey>> --speechRegion eastus --input <<Audio File path>> --stereo  --output summary.json
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,392 questions
{count} votes