In Pronunciation Assessment, it looks like the 'insertion' category of errors will only 'insert' words that appear in the reference text, and won't actually transcribe the actual word spoken/inserted. Is this expected behavior?

Question

In Pronunciation Assessment, it looks like the 'insertion' category of errors will only 'insert' words that appear in the reference text, and won't actually transcribe the actual word spoken/inserted. Is this expected behavior?

Alex Del Giudice 25

When using Pronunciation assessment mode, if the speaker inserts a word that is not in the reference text, the word that is transcribed in its place (with the 'insertion' error type) is always a similar sounding word from the reference text rather than the word that was actually spoken. If we are interested in recording reading errors (categorizing them, maybe, as semantic or phonological, errors) this is not so useful (although, it appears that we can examine the phonemic errors). So i'm wondering if there's a way to run pronunciation assessment but allow for the transcription of insertions to be based on the audio input rather than the reference text.

VasaviLankipalle-MSFT 18,681 Reputation points Moderator

2024-02-21T18:44:36.9866667+00:00

Hello @Alex Del Giudice Thanks for using Microsoft Q&A Platform.

Please allow sometime will check internally and get back to you on this.
VasaviLankipalle-MSFT 18,681 Reputation points Moderator

2024-02-22T22:07:34.9266667+00:00

@Alex Del Giudice , if possible, please share sample output response to reproduce the same on our end.

Accepted answer

0 additional answers

Your answer

VasaviLankipalle-MSFT 18,681 Reputation points Moderator

2024-02-21T18:44:36.9866667+00:00

Hello @Alex Del Giudice Thanks for using Microsoft Q&A Platform.

Please allow sometime will check internally and get back to you on this.
VasaviLankipalle-MSFT 18,681 Reputation points Moderator

2024-02-22T22:07:34.9266667+00:00

@Alex Del Giudice , if possible, please share sample output response to reproduce the same on our end.

Answer 1

VasaviLankipalle-MSFT 18,681 Moderator

Hello @Alex Del Giudice , thank you for your time and patience throughout this issue.

The product team confirmed that this behavior is by design.

Also, without a reference text, it would be difficult to determine which words are insertions or omissions. The Pronunciation Assessment tool relies on the reference text to compare the transcript with the expected transcript.

I hope this helps.

Regards,

Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Alex Del Giudice 25 Reputation points

2024-02-28T13:25:06.1366667+00:00

I understand. But i wasn't suggesting that we wouldn't use a reference text. The reference text is obviously necessary for pronunciation assessment.
However, when a child inserts a word, it can be useful to understand aspects of that word, to better analyze errors. If a word that is inserted is significantly different from the next word in the text, it could be useful to know what the actual pronounced word is to recognize whether this was an error related to syntactic parsing, for example.
It is similarly useful to recognize when a word that is 'mispronounced' is actually a phonologically similar word (or what we might refer to as a 'substitution'). So when a child is trying to read 'cage' but produces a different word if that word is 'cape' then we know it's a phonologically similar word with a similar part of speech and it suggests that the error might be a result of a whole-word reading strategy. If instead the pronounced word is "Caghee" then we see it's not a real word, and that this error is more likely related to orthographic/phonological errors. If i could suggest an improvement for some future version of pronunciation assessment, it would be to include in the analysis not just the list of potential phonemes and the probabilities for any given word, but also a 'best guess' at what the mispronounced or inserted word might be (if it is strongly similar to a real word pronunciation)

Share via

In Pronunciation Assessment, it looks like the 'insertion' category of errors will only 'insert' words that appear in the reference text, and won't actually transcribe the actual word spoken/inserted. Is this expected behavior?

0 additional answers

Your answer