Hello Darsh !
Thank you for posting on Microsoft Learn.
Azure Pronunciation Assessment in the Speech Service is generally robust and supports a wide range of non-native English accents, East Asian accents such as those from Hong Kong, Japan, Korea, and mainland China.
While Azure does well with common non-native accents, extreme deviations in intonation, consonant sounds (like /l/ and /r/ for Japanese speakers), and syllable timing may result in lower-than-expected scores.
It may not recognize certain locally accepted pronunciations or loanwords common in Hong Kong English and some users experienced some difficulties with consonant clusters and vowel insertion might receive low pronunciation scores, even when the meaning is clear.
Use ReferenceText Mode with full sentences to improve context-based scoring where you set "EnableMiscue": true to allow insertions, omissions, or word order changes, which are common among non-native speakers.
You have also the possibility to adjust the GradingSystem (HundredMark vs FivePoint) depending on how sensitive you want the scoring to be.
What I highly recommend is that you create a calibration dataset using recordings from real users in your target demographic.