Translator gives unexpected word alignment for Japanese to English translations

I'm using azure-translator to translate sentences from Japanese to English with word alignment. However the word alignment seems to be incorrect. I don't get any errors, and I get the expected result when I instead translate from English to Japanese.
I have followed this example:
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/word-alignment
When translating "Can I drive your car tomorrow?" from English to Japanese the alignment I get is "0:2-10:14 6:10-8:9 12:15-2:5 12:15-7:7 17:19-6:6 21:29-0:1 21:29-15:15"
('Can', 'できますか') ('drive', '運転') ('your', 'あなたの') ('your', 'を') ('car', '車') ('tomorrow?', '明日') ('tomorrow?', '。')
If I exclude the second item of each duplicate, these are all correct.
However when translating "明日あなたの車を運転できますか?" from Japanese to English I get "0:1-0:2 2:5-4:4 6:6-6:10 7:7-12:15 8:9-17:19 15:15-21:29"
('明日', 'Can') ('あなたの', 'I') ('車', 'drive') ('を', 'your') ('運転', 'car') ('?', 'tomorrow?')
None of these are correct.
Is Japanese to English word alignment expected to be correct, and does it work in the same way as for other languages? Thanks!
Hey I have a clue about what is happening. When translating from Japanese to English it appears the alignment indicies have been sorted for both languages before being paired. This shouldn't happen, they should be paired first and only be sorted according to the source language.
Notice how the order of the words is unchanged for both English and Japanese.
Is this something I can expect to be fixed, or would I have to wait until api-version 3.1? Thanks!
@Edward Coventry We can confirm this as a bug and our team is currently working on it. But, we do not have an ETA on its rollout. We will update this thread with more details as soon as the fix is available.
That's great, thank you very much.
@romungi-MSFT please see Spanish language same problem.
Any update on this? Shame as it was a nice feature.
Sign in to comment
1 answer
Sort by: Most helpful
Writing answer due to 1000 char limit on comments...
@romungi-MSFT I can confirm this is also a problem now in Spanish.
This is the alignment information for "Sí, sabes que ya llevo un rato mirándote"
"Translations": [
{
"Text": "Yes, you know I've been looking at you for a while.",
"To": "en",
"Alignment": {
"Proj": "0:2-0:3 0:2-5:7 4:8-9:12 10:12-14:17 14:15-19:22 17:21-24:30 17:21-32:33 23:24-35:37 23:24-39:41 31:39-43:43 31:39-45:50"
Sí, Yes,
Sí, you
sabes know
que I've
ya been
llevo looking
llevo at
un you
un for
mirándote a
mirándote while.
Which is completely wrong now.
Someone, somewhere in the innards of Microsoft has broken it. For last year (on 2020-04-23 09:06:23.700Z) I made a the same call it was working...
"Translations": [
{
"Text": "Yes, you know I've been looking at you for a while.",
"To": "en",
"Alignment": {
"Proj": "0:2-0:3 4:8-9:12 4:8-5:7 17:21-14:17 17:21-19:22 23:24-43:43 26:29-45:50 26:29-39:41 31:39-35:37 31:39-32:33"
Sí, Yes,
sabes know
sabes you
llevo I've
llevo been
un a
rato while.
rato for
mirándote you
mirándote at
Thanks!
Sign in to comment