Azure pronunciation assessment SDK not processing audio pieces larger than 1 minute

Question

Azure pronunciation assessment SDK not processing audio pieces larger than 1 minute

Tomas Moyano 0

Im currently using Azure pronunciation assessment example from cognitive services speech SDK with the next code

const speechConfig = sdk.SpeechConfig.fromSubscription( API_KEY, REGION, ); speechConfig.speechRecognitionLanguage = 'es-ES';
const referenceText =
  'Internet es una enorme red de dispositivos interconectados mundialmente gracias a millones de kilómetros de cables de fibra óptica que pasan por debajo de los océanos. Para poder viajar, los archivos deben convertirse a un lenguaje que entiendan las computadoras. Ese es el LENGUAJE DE MÁQUINA y es de tipo BINARIO, porque solo tiene dos valores: 1 y 0. Combinando esos dos valores se generan todos los tipos de contenidos que ves y escuchás en cualquier dispositivo. ¡Increíble! ¿No?. Además, para llegar más velozmente (y que no te aburras esperando), los archivos se dividen en pequeñas partes, llamadas PAQUETES DE DATOS, que toman el mejor camino disponible hasta a tu dispositivo. Para lograrlo, los paquetes viajan por el mundo a través de la inmensa red de CABLES DE FIBRA ÓPTICA SUBMARINOS, redirigiéndose por varios ROUTERS. En tu dispositivo, que puede estar conectado a internet a través de un CABLE o INALÁMBRICAMENTE (gracias a tu MÓDEM), los pequeños paquetes se unen formando el archivo que habías solicitado y aparecen de forma completa.';
const pronunciationAssessmentConfig = new sdk.PronunciationAssessmentConfig(
  referenceText,
  sdk.PronunciationAssessmentGradingSystem.HundredMark,
  sdk.PronunciationAssessmentGranularity.Phoneme,
  true,
);
const audioConfig = sdk.AudioConfig.fromWavFileInput(
  fs.readFileSync('test.wav'),
);
const reco = new sdk.SpeechRecognizer(speechConfig, audioConfig);
pronunciationAssessmentConfig.applyTo(reco);
function onRecognizedResult(result: { text: string }) {
  console.log('pronunciation assessment for:', result.text);
  const pronunciation_result =
    sdk.PronunciationAssessmentResult.fromResult(result);
  console.log(
    ' Accuracy score:',
    pronunciation_result.accuracyScore,
    '\n',
    'pronunciation score:',
    pronunciation_result.pronunciationScore,
    '\n',
    'completeness score :',
    pronunciation_result.completenessScore,
    '\n',
    'fluency score:',
    pronunciation_result.fluencyScore,
  );
  console.log('  Word-level details:');
  for (const [
    idx,
    word,
  ] of pronunciation_result.detailResult.Words.entries()) {
    console.log(
      '    ',
      idx + 1,
      ': word:',
      word.Word,
      '\taccuracy score:',
      word.PronunciationAssessment.AccuracyScore,
      '\terror type:',
      word.PronunciationAssessment.ErrorType,
      ';',
    );
  }
  reco.close();
}
reco.recognizeOnceAsync(function (successfulResult: any) {
  onRecognizedResult(successfulResult);
});

The problem occurs when the audio i record is larger than 1 minute. The assessment is only executed for the first minute giving omission for every word that is not mentioned. How can I process the entire length of the audio?

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-28T06:14:39.25+00:00

@Tomas Moyano Did the above help to answer your query?

1 answer

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-07-28T06:14:39.25+00:00

@Tomas Moyano Did the above help to answer your query?

Answer 1

@Tomas Moyano I see you are using recognizeOnceAsync() with an audio file. Is there any pause in audio that is being passed? If there is a pause in audio, the method will detect the same as end of input and stop recognition. In this case it is advised to use startContinuousRecognitionAsync() and stopContinuousRecognitionAsync() instead. Please see this sample where it uses stream input to read from file. The file details are to be updated in settings.js along with your subscription details. Please see this sample for details.

Also, you can try the same assessment through speech studio through microphone input and file. This can help you confirm if the assessment is working as expected. Thanks!!

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Azure pronunciation assessment SDK not processing audio pieces larger than 1 minute

1 answer

Your answer