False positive results in Diarization service based on batch transcription

IMExperts Admin 0 Reputation points
2023-09-21T16:36:50.8566667+00:00

We are getting false positive results in the function of separating interlocutors (diarization) through batch transcription when many speakers talk simultaneously in short time intervals. We are using client SDK version 1.30:

<groupId>com.microsoft.cognitiveservices.speech</groupId>

<artifactId>client-sdk</artifactId>

<version>1.30.0</version>

With the next snippet of code. Please do you have any recomendations to minimize this false positive results and improve diarization accuracy?

//-------------- 1. Create Transcription Job
            JsonArray jaChannels = new JsonArray();
            jaChannels.add(0);
            
            JsonArray jaLocales = new JsonArray();
            jaLocales.add("en-US");jaLocales.add("pt-BR");jaLocales.add("es-ES");
            
            JsonArray jaContentURL = new JsonArray();
            jaContentURL.add(this.audioURL);
            
            JsonObject joSpeakers = new JsonObject();
            joSpeakers.addProperty("minCount", 1);
            joSpeakers.addProperty("maxCount", 10);

            JsonObject joDiarization = new JsonObject();
            joDiarization.add("speakers", joSpeakers);

            JsonObject joLanguageIdentification = new JsonObject();
            joLanguageIdentification.add("candidateLocales", jaLocales);
            joLanguageIdentification.add("speechModelMapping", new JsonObject());
            

            JsonObject joProperties = new JsonObject();
            joProperties.addProperty("diarizationEnabled", true);
            joProperties.addProperty("wordLevelTimestampsEnabled", false);
            joProperties.addProperty("displayFormWordLevelTimestampsEnabled", false);
            joProperties.add("channels", jaChannels);
            joProperties.add("diarization", joDiarization);
            joProperties.add("languageIdentification", joLanguageIdentification);
         
  
            JsonObject jsonPayload = new JsonObject();
            jsonPayload.addProperty("locale", this.locale);
            jsonPayload.addProperty("displayName", this.media.getIdMid()+"_"+processStartTime);
            jsonPayload.addProperty("description", this.media.getIdMid()+"_"+processStartTime);
            jsonPayload.add("customProperties", new JsonObject());
            jsonPayload.add("contentUrls", jaContentURL);
            jsonPayload.add("properties", joProperties);



            String request = new Gson().toJson(jsonPayload);

            log.info("\n\t\t 1. Creating Transcription... uri: {}", this.speechTranscriptionAPI);
            log.info(request);
            RestResult result = RestHelper.sendPost(this.speechTranscriptionAPI, request, this.speechSubscriptionKey, new int[] { HttpURLConnection.HTTP_CREATED });
            String transcriptionUri_1 = result.getJson().get("self").getAsString();
            String[] transcriptionUri_2 = transcriptionUri_1.split("/");
            String transcriptionId = transcriptionUri_2[transcriptionUri_2.length - 1];
            try {
                UUID uuid = UUID.fromString(transcriptionId);  // Verify the transcription ID is a valid GUID.
            } catch (IllegalArgumentException exception) {
                throw new Exception(String.format("Unable to parse response from Create Transcription API:%s%s", System.lineSeparator(), result.getText()));
            } 

            log.info("\n\t\tTranscription ID: {}", transcriptionId);

            //-------------- 2. Get Transcription Status
            String transcriptionStatusUri = this.speechTranscriptionAPI + "/" + transcriptionId;
            log.info("\n\n\t\t 2. Getting Transcription Status... uri: {}", transcriptionStatusUri);
            boolean done = false;
            while (!done)
            {
                log.info("\n\t\tWaiting {} seconds for transcription to complete.", waitSeconds);
                Thread.sleep(waitSeconds * 1000);
                // Get Transcription response
                result = RestHelper.sendGet(transcriptionStatusUri, this.speechSubscriptionKey, new int[] { HttpURLConnection.HTTP_OK });
                String status = result.getJson().get("status").getAsString().toLowerCase();
                log.info(String.format("\t\tTranscription Status: %s", status));
                if (status.equals("failed")){
                    throw new Exception(String.format("Unable to transcribe audio input. Response:%s%s", System.lineSeparator(), result.getText()));
                }
                done = status.equals("succeeded");
            }

            //-------------- 3. Get Transcription Files
            String transcriptionFilesUri = transcriptionStatusUri + "/files";
            log.info("\n\t\t 3. Getting Transcription Files... uri: {}", transcriptionFilesUri);
            RestResult transcriptionFiles = RestHelper.sendGet(transcriptionFilesUri, speechSubscriptionKey, new int[] { HttpURLConnection.HTTP_OK });      
            Optional<String> contentUri = Optional.empty();
            Iterator<JsonElement> iterator = transcriptionFiles.getJson().getAsJsonArray("values").iterator();
            while (iterator.hasNext()) {
                JsonObject value = (JsonObject)iterator.next().getAsJsonObject();
                if (value.get("kind").getAsString().toLowerCase().equals("transcription")){
                    contentUri = Optional.of(value.getAsJsonObject("links").get("contentUrl").getAsString());
                    break;
                }
            }
            if (!contentUri.isPresent()) {
                throw new Exception (String.format("Unable to parse response from Get Transcription Files API:%s%s", System.lineSeparator(), transcriptionFiles.getText()));
            }
            final String transcriptionUri = contentUri.get();

            //-------------- 3. Get Transcription Content
            log.info("\n\t\t 4. Getting Transcription Content... uri: {}", transcriptionUri);
            RestResult transcriptionResult = RestHelper.sendGet(transcriptionUri, "", new int[] { HttpURLConnection.HTTP_OK });
            JsonObject transcriptionJson = transcriptionResult.getJson();
            //log.info(transcriptionJson);

            MediaServer mediaServer = new MediaServer();
            List<SpeakerIdResult> speakersList = new ArrayList<SpeakerIdResult>();
            mediaServer.setSpeakerIdResults(speakersList);
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,713 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,834 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 51,701 Reputation points
    2023-09-21T21:51:17.8533333+00:00

    Hello @IMExperts Admin

    Thanks for reaching out to us. May I know how you set your diarization? If you are working on multi-speaker case, it's helpful to at least give a minCount parameter and also a maxCount.

     "diarization": {
                "speakers": {
                  "minCount": 3,
                  "maxCount": 5
                }
              }
    

    Also it seems SDK is not the first choice of batch transcription, could you please try the latest version of REST API to see how it works? As my personal experience, latest version may be better.

    If the result still does not meet your requirements, I would suggest you try custom model or Whisper model -

    Please refer to the document - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api

    And the preview version specification- https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/Speech/SpeechToText/preview/v3.2-preview.1

    If you want to continue using SDK, I think it's a good way to contact to SDK team to see if there any way to reduce the fault positive rate by below link -

    https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues?q=is%3Aissue+is%3Aopen

    I hope this helps.

    Regards,

    Yutong

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.