Share via

Is there information regarding how the alignment algorithm in 'PronunciationAssessment' mode works when there are insertions caused by the repetition of whole phrases?

Alex Del Giudice 35 Reputation points
2024-02-20T16:29:25.9266667+00:00

I'm trying to understand the alignment behavior in pronunciation assessment.

When the target text is something like:

He wanted to repeat what he'd heard but couldn't remember it all.

And the voice reading says something like:

He wanted to rep what he'd learned reep what he'd repeat what he'd lear heard but couldn't remember it all.

I'm seeing some inconsistent behavior and am wondering whether it's defaulting to a 'longest correct ordered sequence' or whether it's looking at the pronunciation accuracy to find the correctly ordered sequence to label 'correct' and calling everything else an insertion.

Azure Speech in Foundry Tools

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 18,736 Reputation points Moderator
    2024-02-28T02:12:29.7266667+00:00

    Hello @Alex Del Giudice ,

    In pronunciation assessment, we usually have a "reference/target text". After the recognition, an algorithm, edit distance, and other steps will be applied to compute the insertion, and deletion error.

    Only the words which are tag by sequence matching algorithm as matched word will be assign "mispronunciation" or None" tag like that.

    I hope this helps. We appreciate your time and patience throughout this issue. Regards,

    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.