Azure Cognitive Speech to Text Duplicate Sentences returned on Channel 0 and Channel 1

Question

We are developing a solution using Azure Cognitive Speech to Text service and have an issue with duplicate sentences being returned.

We have some cases with dual channel audio which appear to transcribe correctly with speaker channels. We have stereo audio in the input file.

I suspect the issue is the type of audio but is there any suggestions on why we may be seeing duplicate responses for a transcription for both channels?

The audio is a call recording between two people.

See example:

Example API request

{
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "duration": null,
        "channels": null,
        "destinationContainerUrl": null,
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked",
        "timeToLive": "P0Y0M30DT0H0M0S",
        "diarization": null,
        "languageIdentification": null,
        "email": null,
        "error": null
    },
    "contentUrls": [
        "xxxxxxx"
    ],
    "locale": "en-GB",
    "displayName": "Transcription Data",
    "customProperties": {}

Example API response transcription

{
  "source": "xxxxxxx",
  "timestamp": "2023-06-05T14:11:57Z",
  "durationInTicks": 12957400000,
  "duration": "PT21M35.74S",
  "combinedRecognizedPhrases": [
    {
      	"channel": 0,
      	"lexical": "hi victoria nice to see you today 
	}
	{ 	"channel": 1,       
		"lexical": "hi victoria nice to see you today
	}

Answer

I am also facing same issue. Please suggest.

We are referring this github code: Batch ranscription approach - REST API

https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/182fabecee81a4af948e6a049db84da60cb09181/samples/batch/csharp

Share via

Azure Cognitive Speech to Text Duplicate Sentences returned on Channel 0 and Channel 1

1 answer

Your answer