A fix has been rolled out, issue should be resolved now, let us know if otherwise. Sorry for the inconvenience. Thanks.
Speech to text - Norwegian: Capitalization
We're using Microsoft.CognitiveServices.Speech for transcription/subtitling of video clips, mostly Norwegian materials. We have noticed that e.g. spelling of proper nouns is impressively correct, but that capitalization has been missing. But as of this weekend, there seems to be a change. Now, there is suddenly TOO MUCH capitalization going on. E.g., all occurrences of the word "nok" is written in all caps (which makes it look like the abbreviation for Norwegian currency (NOK)). The same thing happens for certain other words, like "FRA" and "ET". Also, seemingly random words in the middle of sentences are capitalized. Is this a bug MS is aware of, so that we can expect a fix soon?
C#
Azure AI services
-
GiftA-MSFT 11,166 Reputation points
2021-03-15T18:51:56.353+00:00 Hi, are you using online transcription or batch transcription?
-
Gunnar Sylthe 6 Reputation points
2021-03-15T20:31:15.697+00:00 Online, using
Microsoft.CognitiveServices.Speech.SpeechRecognizer
from C#. -
GiftA-MSFT 11,166 Reputation points
2021-03-16T02:33:02.4+00:00 Okay, thanks, we're reviewing your feedback and will get back to you soon. Thanks.
-
Rune Skaar 1 Reputation point
2021-03-19T07:46:20.657+00:00 Do you have any new status on this issue / error?
We are currently running a user test with a POC solution that uses these transcription services. And we have received a lot of critical feedback from test users in recent days. -
Gunnar Sylthe 6 Reputation points
2021-03-19T08:24:04.267+00:00 Sorry, but any news on this? We're getting lots of complaints from our customers, as this error severely degrades the quality of our (albeit experimental) automatic subtitling.
-
GiftA-MSFT 11,166 Reputation points
2021-03-19T15:36:04.467+00:00 Hi, we are still investigating this issue and will get back to you shortly. Thanks.
Sign in to comment
1 answer
Sort by: Most helpful
-
GiftA-MSFT 11,166 Reputation points
2021-03-19T20:59:06.443+00:00 -
Gunnar Sylthe 6 Reputation points
2021-03-22T07:21:04.87+00:00 Thanks for the swift response. It's perhaps a little better now. However, I'm sorry to say we're still seeing many of the same problems. A few concrete examples: Every occurrence of certain words is in all caps, such as JA, ET, FRA... Other words are consistently incorrectly capitalized, such as Jo, Hva, Kan, NĂĄr, Det, Med, Hun, Jeg, Men, Det, Selv, Vi... The list goes on. So if you could have another go at fixing this, it would be greatly appreciated.
But let me also say that there has been a great improvement in capitalizing words that SHOULD be capitalized! Proper nouns such as Mallorca, Gran Canaria, Spania etc. etc. are now capitalized correctly. Kudos where kudos is due! :-)
-
GiftA-MSFT 11,166 Reputation points
2021-03-22T15:01:32.957+00:00 Thanks for your feedback, we'll get back to you soon!
-
GiftA-MSFT 11,166 Reputation points
2021-03-23T15:54:47.623+00:00 Hi, we've made some changes, can you confirm whether you observe improvements?
-
Gunnar Sylthe 6 Reputation points
2021-03-23T16:20:39.747+00:00 Hi again! Yes, we noticed considerable improvements this morning. However, it seems to vary...? Could it be that you're still rolling out the changes, and that our results would depend on which server we're connecting to when starting a session...? Because with some sessions, the results are about the same as yesterday, whereas others give MUCH better results!
Thanks for being so responsive, we really appreciate it.
-
Rune Skaar 1 Reputation point
2021-03-23T16:41:34.477+00:00 There is definitely an improvement compared to how it was a few days ago, but there are still some short Norwegian words that are consistently written in capital letters no matter where they are in the sentence.
For example
- JA - yes
- FOR - for
- NOK - enough (while NOK in capital letters means Norwegian kroner)
- ET - a (ET spørsmål - a question)
- FRA - from
- OPP - up
We have not seen these errors before until they suddenly appeared last week, but now they come quite consistently all the time.
-
GiftA-MSFT 11,166 Reputation points
2021-03-23T18:33:45.347+00:00 Thanks for your feedback. We rolled out changes today. Please feel free to share updates by tomorrow and let us know your observations.
-
Gunnar Sylthe 6 Reputation points
2021-03-24T07:17:20.563+00:00 Good morning! Unfortunately, we're still seeing much of the same. The results may still be depending on which server we get connected with (?), but so far today my trials have been disappointing, I'm sorry to say. I'll try to attach a screen shot of a short example from this morning, where I've marked incorrect capitalizations in red.
-
Gunnar Sylthe 6 Reputation points
2021-03-24T07:24:11.343+00:00 A colleague just reported much better results, so the results are still variable, obviously. We're using the northeurope region with the API, by the way.
-
Rune Skaar 1 Reputation point
2021-03-24T12:17:30.377+00:00 Even though the quality of the transcription has improved a bit, it is still a big problem that some Norwegian words (FOR, ET, NOK, FRA, OPP, ...) are consistently transcribed in capital letters.
This is actually such a big problem that we had to shut down a "proof of concept solution" we have had up and running for a while where we offer automatically generated subtitles in Norwegian to end users.
Therefore, I hope you can prioritize finding a solution to these problems.
-
GiftA-MSFT 11,166 Reputation points
2021-03-25T14:32:34.667+00:00 Hi all, thanks for your feedback, we are still investigating this issue. Our assumption is that there might be some delay in deployment to production. Will share updates as soon as possible. Thanks.
-
Gunnar Sylthe 6 Reputation points
2021-03-26T09:55:10.637+00:00 Looking much, much better now, thank you! Still SOME incorrectly capitalized words, but a huge improvement. Thank you for being so responsive, it's appreciated!
-
Rune Skaar 1 Reputation point
2021-03-26T10:12:06.513+00:00 Can also confirm that it looks much better today, the annoying errors with capital letters in words like ET, FRA, UT, NOK, OPP,… now seem to be completely gone. Thanks for the help :-)
-
GiftA-MSFT 11,166 Reputation points
2021-03-29T18:44:57.52+00:00 Glad to be of help!
-
Gunnar Sylthe 6 Reputation points
2021-04-07T06:22:08.3+00:00 I thought I saw a comment from you asking for examples of words that are still incorrectly capitalized. Late answer due to Easter holidays, but as far as I can tell, there are now only a very few problematic words remaining. Predominantly "Skal" (shall/will); this word seems to always be transcribed with a capital S. Other words, such as "Det" (it), "Dette" (this) and "Etter" (after) are sometimes transcribed with a capital letter even in the middle of sentences, and sometimes correctly.
-
GiftA-MSFT 11,166 Reputation points
2021-04-08T15:01:45.547+00:00 Hi, thanks for the details. We will review and share updates soon.
-
GiftA-MSFT 11,166 Reputation points
2021-04-08T18:18:48.623+00:00 Quick follow-up, it is possible that you are observing the capitalization when some given context words appear?
-
Rune Skaar 1 Reputation point
2021-04-08T19:57:09.68+00:00 >is possible that you are observing the capitalization when some given context words appear?
No these errors are not related to the context, but seems to be constant for some words
see example:
-
Rune Skaar 1 Reputation point
2021-04-08T20:29:46.073+00:00 Also adding another example in attached PDF:
The mentioned capitalization errors on words in the middle of sentences are marked in red.
These errors seem to be reasonably consistent with words like:- Hvis (If)
- Det (That)
- Dette (This)
- Hvor (Where)
- ...
And some other examples of punctuation and capitalization errors are marked in blue
-
GiftA-MSFT 11,166 Reputation points
2021-04-09T15:27:25.877+00:00 Thanks for the information, will share updates soon.
-
GiftA-MSFT 11,166 Reputation points
2021-04-09T16:58:24.057+00:00 Following-up again :). We've added this item to our backlog due to some priority tasks. An ETA has been set for 4/23 to resolve this issue, please let us know if you have any concerns. Will continue to keep you updated. Thanks!
-
Rune Skaar 1 Reputation point
2021-04-12T08:21:54.753+00:00 Thanks for feedback, if you need more input og examples please let me know
-
Gunnar Sylthe 6 Reputation points
2021-04-26T06:20:29.433+00:00 Looking a lot better now, thank you. However, capitalization of proper nouns now seems to be missing again...?
-
GiftA-MSFT 11,166 Reputation points
2021-04-27T14:34:55.833+00:00 Hi, thanks for the updates. Some changes were made which re-introduced the issue missing capitalization of proper nouns. We are currently building another version of cap dictionary which will add more proper cap items back again. ETA is three weeks from now.
-
Rune Skaar 1 Reputation point
2021-06-07T08:38:20.297+00:00 Do you have a new status regarding this issue? Have you published a new version of the cap dictionary or any other fixes?
(We do not experience any significant improvements when it comes to capital letters, names and places)
Sign in to comment -