@Virtro Dev I suspect some of the apostrophes might be Latin-1 or Unicode that could have crept in your transcript. The guidance around using apostrophes is to replace them with appropriate ASCII substitution. Here is the guidance for the same in the documentation. Since some of your words do not have issues after processing it is easier to replace the incorrect apostrophes with the ones that go through. Thanks!!
Missing apostrophes when uploading human-labeled transcript for custom speech
Virtro Dev
21
Reputation points
Hi, I am currently trying to create a custom STT using Custom Speech service, after uploading my Audio + human-labeled transcript (txt file, separated by \t, UTF-8 with BOM) , a lot of the apostrophes are missing in the Human-labeled transcription (normalized). such as don't become don t, can't become can t and more, however there are some exceptions such as there's and it's are labeled correctly. Due to the incorrect labels, I can't train and test my custom model correctly. Please Help!
Accepted answer
-
romungi-MSFT 48,541 Reputation points Microsoft Employee
2021-01-19T05:27:12.337+00:00