Translator gives unexpected word alignment for Japanese to English translations

Edward Coventry 1 Reputation point
2020-11-24T13:49:15.31+00:00

I'm using azure-translator to translate sentences from Japanese to English with word alignment. However the word alignment seems to be incorrect. I don't get any errors, and I get the expected result when I instead translate from English to Japanese.

I have followed this example:
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/word-alignment

When translating "Can I drive your car tomorrow?" from English to Japanese the alignment I get is "0:2-10:14 6:10-8:9 12:15-2:5 12:15-7:7 17:19-6:6 21:29-0:1 21:29-15:15"

('Can', 'できますか') ('drive', '運転') ('your', 'あなたの') ('your', 'を') ('car', '車') ('tomorrow?', '明日') ('tomorrow?', '。')

If I exclude the second item of each duplicate, these are all correct.

However when translating "明日あなたの車を運転できますか?" from Japanese to English I get "0:1-0:2 2:5-4:4 6:6-6:10 7:7-12:15 8:9-17:19 15:15-21:29"

('明日', 'Can') ('あなたの', 'I') ('車', 'drive') ('を', 'your') ('運転', 'car') ('?', 'tomorrow?')

None of these are correct.

Is Japanese to English word alignment expected to be correct, and does it work in the same way as for other languages? Thanks!

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
339 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Karl O'Meara 1 Reputation point
    2021-01-23T14:10:28.727+00:00

    Writing answer due to 1000 char limit on comments...

    @romungi-MSFT I can confirm this is also a problem now in Spanish.

    This is the alignment information for "Sí, sabes que ya llevo un rato mirándote"

    "Translations": [
    {
    "Text": "Yes, you know I've been looking at you for a while.",
    "To": "en",
    "Alignment": {
    "Proj": "0:2-0:3 0:2-5:7 4:8-9:12 10:12-14:17 14:15-19:22 17:21-24:30 17:21-32:33 23:24-35:37 23:24-39:41 31:39-43:43 31:39-45:50"

    Sí, Yes,
    Sí, you
    sabes know
    que I've
    ya been
    llevo looking
    llevo at
    un you
    un for
    mirándote a
    mirándote while.

    Which is completely wrong now.

    Someone, somewhere in the innards of Microsoft has broken it. For last year (on 2020-04-23 09:06:23.700Z) I made a the same call it was working...

    "Translations": [
    {
    "Text": "Yes, you know I've been looking at you for a while.",
    "To": "en",
    "Alignment": {
    "Proj": "0:2-0:3 4:8-9:12 4:8-5:7 17:21-14:17 17:21-19:22 23:24-43:43 26:29-45:50 26:29-39:41 31:39-35:37 31:39-32:33"

    Sí, Yes,
    sabes know
    sabes you
    llevo I've
    llevo been
    un a
    rato while.
    rato for
    mirándote you
    mirándote at

    Thanks!

    0 comments No comments