Is this not what you'r looking for? Sample response for the Phonemes of the word "Thank":
{
"Duration": 4700000,
"Offset": 11500000,
"Phonemes": [
{
"Duration": 2100000,
"Offset": 11500000,
"Phoneme": "th",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 900000,
"Offset": 13700000,
"Phoneme": "ae",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 700000,
"Offset": 14700000,
"Phoneme": "ng",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 700000,
"Offset": 15500000,
"Phoneme": "k",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
}
],
"PronunciationAssessment": {
"AccuracyScore": 100.0,
"ErrorType": "None"
},
"Word": "Thank"
}
Hi Yutong, thanks for the reply. I can't go into great detail but basically I would like to identify the boundaries of phonemes and syllables within wav files. I hope that helps to clarify.