How to transcribe interview with two speakers from a single audio file similar to word 365 using spx recognize cli

Nikolay Bogoychev 46 Reputation points
2023-02-08T07:48:57.5733333+00:00

Hello Everyone,

I have a series of interviews recorded as MP3 files and i would like to use Azure speech CLI to transcribe them in a way similar to the integrated word 365 transcriptor format which is:

329649100_862563028179303_4014168117363499350_n

I would like to use the Azure Speech service, because the WER is much lower. I tried using:

spx recognize --file audio.mp3 --format mp3 --language en-US --output all text --output all file output.tsv
But the output format doesn't provide timestamps and speakers. I see a lot interview options in the help but I couldn't figure out what I would need to produce a the information i require.

The output doesn't need to be plain text, i can postprocess it to get to the format I need, but i would like to get something that contains TIME - SPEAKER_ID - UTTERANCE

Thanks,

Nick

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,382 questions
{count} vote