How to transcribe interview with two speakers from a single audio file similar to word 365 using spx recognize cli

Nikolay Bogoychev 46

Hello Everyone,

I have a series of interviews recorded as MP3 files and i would like to use Azure speech CLI to transcribe them in a way similar to the integrated word 365 transcriptor format which is:

329649100_862563028179303_4014168117363499350_n

I would like to use the Azure Speech service, because the WER is much lower. I tried using:

spx recognize --file audio.mp3 --format mp3 --language en-US --output all text --output all file output.tsv
But the output format doesn't provide timestamps and speakers. I see a lot interview options in the help but I couldn't figure out what I would need to produce a the information i require.

The output doesn't need to be plain text, i can postprocess it to get to the format I need, but i would like to get something that contains TIME - SPEAKER_ID - UTTERANCE

Thanks,

Nick

Ramr-msft 17,606 Reputation points

2023-02-09T03:25:06.0666667+00:00
@Nikolay Bogoychev To transcribe an interview with two speakers from a single audio file, you can use the Azure Speech CLI command spx recognize with the --conversation option. This option enables conversation transcription, which allows you to transcribe multiple speakers in a single audio file.<sup>[0][1][2]</sup>

Here is an example command that you can use:

spx recognize --file audio.mp3 --format mp3 --language en-US --conversation --output all text --output all file output.tsv

This command will transcribe the audio file and output the transcription results in the specified format. The output will include timestamps and speaker IDs for each utterance.

You can also specify additional options, such as the punctuation mode and the word level timestamps, to customize the transcription results. For more information, you can refer to the Azure Speech CLI documentation.<sup>[1]</sup>
Nikolay Bogoychev 46 Reputation points

2023-02-09T14:08:22.9+00:00

Sorry, double post and I am not sure how to delete it...

Nikolay Bogoychev 46

Thank you for the answer, but this didn't work:

 % spx recognize --file audio.mp3 --format mp3 --language en-US --conversation --output all text --output all file output.tsv

SPX - Azure Speech CLI, Version 1.25.0
Copyright (c) 2022 Microsoft Corporation. All Rights Reserved.

ERROR: Parsing command line!!

  audio.input.file=audio.mp3
  audio.input.format=mp3
  audio.input.type=file
  diagnostics.config.log.file=log-{run.time}.log
  output.all.audio.input.id=true
  output.all.recognizer.recognized.result.text=true
  output.all.recognizer.session.started.sessionid=true
  service.config.key= 9cac****************************
  service.config.region=uksouth
  source.language.config=en-US
  x.command=recognize
  x.input.path=@none

  ERROR: Invalid command line argument(s) at "--conversation --output all text --output all file output.tsv".

    SEE: spx help recognize

I also tried diarization or diarizationEnabled but that also didn't work. Ideas?

Ramr-msft 17,606 Reputation points

2023-02-27T07:12:59.28+00:00

Nikolay Bogoychev Here is the sample for Real time Conversation transcription quick start that could help.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-conversation-transcription?pivots=programming-language-csharp