Share via

Significant Decrease in Translation Quality in Azure AI Translator (English → Swedish)

Anton Olivestam 0 Reputation points
2026-03-16T17:31:17.1+00:00

Hi,

I’ve noticed a substantial drop in translation quality when using Azure AI Translator for English → Swedish translations.

For example:

  • Source: “The account deletion link was only valid for 30 minutes and has expired.”
  • Current translation: “Länken till kontoborttagning var bara geldig i 30 minuter och har gått ut.”

The translated sentence reads unnaturally in Swedish, and the word “geldig” is not a Swedish word at all. Until recently, Azure Translator produced much more natural and accurate translations for similar sentences.

What has happened to the translation models that is causing this decrease in translation quality?

Azure Translator in Foundry Tools
0 comments No comments

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-16T17:31:45.7233333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Azure AI Translator is a neural machine translation system whose quality can vary over time and across language pairs. The documentation describes several important characteristics and limitations that explain issues like the one observed, but it does not list specific recent model changes for English → Swedish.

    Key points relevant to the scenario:

    1. Model behavior and limitations
      Translator is trained on previously translated documents and operates mainly at the sentence level without broader real‑world or document‑level context. This can lead to:
      • Literal or awkward translations that do not reflect natural usage or nuance in the target language.
      • Errors where tone, style, or idiomatic correctness are not preserved.
      The documentation explicitly notes that automatic translations can be too literal and that the system may not correctly capture implication, nuance, or culture‑specific usage.
    2. Quality varies by language pair and scenario
      Translation quality is not uniform across all language pairs. The service documentation states that:
      • Quality “differs by language pair,” and suitability must be evaluated per scenario.
      • Quality should be measured on a representative test set for the specific use case.
      This means that an English → Swedish model can perform differently from other pairs and that changes in training data or model tuning can affect perceived quality, positively or negatively, for certain constructions.
    3. Ongoing model evaluation and updates
      Translator quality is continuously measured using automatic metrics (such as BLEU and COMET) and human evaluation. The service is under ongoing improvement, which can involve model updates. However, the documentation does not enumerate or timestamp specific model changes for particular language pairs, nor does it distinguish between “temporary regression” and “intentional update” at the level of individual examples. From the available information, it can only be said that:
      • Models are periodically improved and evaluated using multiple techniques.
      • Human evaluation is used to guide quality, but individual regressions in specific phrases or domains can still occur.
    4. What can be done in response to a perceived regression
      Based on the guidance for evaluating and integrating Translator:
      • Evaluate on a representative test set:
        Build a small but representative English → Swedish test set from the application domain and systematically measure output quality over time. This helps document regressions and provides concrete evidence when engaging support.
      • Use human-in-the-loop review where quality is critical:
        For user‑facing or high‑impact text, keep human oversight in the workflow so that mistranslations or unnatural phrasing can be corrected before reaching end users.
      • Provide feedback and maintain a feedback loop:
        The service guidance recommends having a feedback channel and monitoring Translator in production. When specific problematic outputs like the “geldig” example are identified, they can be reported through Azure support or product feedback channels so they can be considered in future model tuning.
      • Consider customizations if applicable:
        For domains where terminology or phrasing must be very precise, Custom Translator with domain‑specific training data and (where appropriate) dictionaries can help steer translations toward preferred outputs. The documentation cautions that dictionaries should be used sparingly and mainly for compound nouns, but domain training data can significantly improve quality.
    5. What cannot be confirmed from the documentation
      The provided documentation does not specify:
      • The exact change that would have introduced the specific “geldig” error in English → Swedish.
      • Whether this is a known temporary regression or a side effect of an intentional update.
      • Any scheduled fix or rollback for this particular behavior.
      For authoritative information on a specific regression and any planned fix, the appropriate next step is to open an Azure support request with concrete examples and timestamps so the product team can investigate the underlying model version and training data involved.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.