Get started with Document Translation

In this article, you'll learn to use Document Translation with HTTP REST API methods. Document Translation is a cloud-based feature of the Azure Translator service. The Document Translation API enables the translation of whole documents while preserving source document structure and text formatting.

Prerequisites

Note

  • Typically, when you create a Cognitive Service resource in the Azure portal, you have the option to create a multi-service key or a single-service key. However, Document Translation is currently supported in the Translator (single-service) resource only, and is not included in the Cognitive Services (multi-service) resource.

  • Document Translation is only supported in the S1 Standard Service Plan (Pay-as-you-go) or in the D3 Volume Discount Plan. See Cognitive Services pricing—Translator.

To get started, you'll need:

  • An active Azure account. If you don't have one, you can create a free account.

  • An Azure blob storage account. You'll create containers to store and organize your blob data within your storage account.

  • A single-service Translator resource (not a multi-service Cognitive Services resource):

    Complete the Translator project and instance details fields as follows:

    1. Subscription. Select one of your available Azure subscriptions.

    2. Resource Group. You can create a new resource group or add your resource to a pre-existing resource group that shares the same lifecycle, permissions, and policies.

    3. Resource Region. Choose Global unless your business or application requires a specific region. If you're planning on using a system-assigned managed identity for authentication, choose a non-global region.

    4. Name. Enter the name you have chosen for your resource. The name you choose must be unique within Azure.

      Note

      Document Translation requires a custom domain endpoint. The value that you enter in the Name field will be the custom domain name parameter for your endpoint.

    5. Pricing tier. Document Translation isn't supported in the free tier. Select Standard S1 to try the service.

    6. Select Review + Create.

    7. Review the service terms and select Create to deploy your resource.

    8. After your resource has successfully deployed, select Go to resource.

Your custom domain name and key

Important

  • All API requests to the Document Translation service require a custom domain endpoint.
  • You won't use the endpoint found on your Azure portal resource Keys and Endpoint page nor the global translator endpoint—api.cognitive.microsofttranslator.com—to make HTTP requests to Document Translation.

What is the custom domain endpoint?

The custom domain endpoint is a URL formatted with your resource name, hostname, and Translator subdirectories:

https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0

Find your custom domain name

The NAME-OF-YOUR-RESOURCE (also called custom domain name) parameter is the value that you entered in the Name field when you created your Translator resource.

Image of the Azure portal, create resource, instant details, name field.

Get your key

Requests to the Translator service require a read-only key for authenticating access.

  1. If you've created a new resource, after it deploys, select Go to resource. If you have an existing Document Translation resource, navigate directly to your resource page.
  2. In the left rail, under Resource Management, select Keys and Endpoint.
  3. Copy and paste your key in a convenient location, such as Microsoft Notepad.
  4. You'll paste it into the code below to authenticate your request to the Document Translation service.

Image of the get your key field in Azure portal.

Create Azure blob storage containers

You'll need to create containers in your Azure blob storage account for source and target files.

  • Source container. This container is where you upload your files for translation (required).
  • Target container. This container is where your translated files will be stored (required).

Note

Document Translation supports glossaries as blobs in target containers (not separate glossary containers). If want to include a custom glossary, add it to the target container and include the glossaryUrl with the request. If the translation language pair is not present in the glossary, it will not be applied. See Translate documents using a custom glossary

Create SAS access tokens for Document Translation

The sourceUrl , targetUrl , and optional glossaryUrl must include a Shared Access Signature (SAS) token, appended as a query string. The token can be assigned to your container or specific blobs. See Create SAS tokens for Document Translation process.

  • Your source container or blob must have designated read and list access.
  • Your target container or blob must have designated write and list access.
  • Your glossary blob must have designated read and list access.

Tip

  • If you're translating multiple files (blobs) in an operation, delegate SAS access at the container level.
  • If you're translating a single file (blob) in an operation, delegate SAS access at the blob level.
  • As an alternative to SAS tokens, you can use a system-assigned managed identity for authentication.

HTTP requests

A batch Document Translation request is submitted to your Translator service endpoint via a POST request. If successful, the POST method returns a 202 Accepted response code and the batch request is created by the service. The translated documents will be listed in your target container.

HTTP headers

The following headers are included with each Document Translation API request:

HTTP header Description
Ocp-Apim-Subscription-Key Required: The value is the Azure key for your Translator or Cognitive Services resource.
Content-Type Required: Specifies the content type of the payload. Accepted values are application/json or charset=UTF-8.

POST request body properties

  • The POST request URL is POST https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0/batches
  • The POST request body is a JSON object named inputs.
  • The inputs object contains both sourceURL and targetURL container addresses for your source and target language pairs.
  • The prefix and suffix are case-sensitive strings to filter documents in the source path for translation. The prefix field is often used to delineate subfolders for translation. The suffix field is most often used for file extensions.
  • A value for the glossaries field (optional) is applied when the document is being translated.
  • The targetUrl for each target language must be unique.

Note

If a file with the same name already exists in the destination, the job will fail.

Translate all documents in a container

{
    "inputs": [
        {
            "source": {
                "sourceUrl": "https://my.blob.core.windows.net/source-en?sv=2019-12-12&st=2021-03-05T17%3A45%3A25Z&se=2021-03-13T17%3A45%3A00Z&sr=c&sp=rl&sig=SDRPMjE4nfrH3csmKLILkT%2Fv3e0Q6SWpssuuQl1NmfM%3D"
            },
            "targets": [
                {
                    "targetUrl": "https://my.blob.core.windows.net/target-fr?sv=2019-12-12&st=2021-03-05T17%3A49%3A02Z&se=2021-03-13T17%3A49%3A00Z&sr=c&sp=wdl&sig=Sq%2BYdNbhgbq4hLT0o1UUOsTnQJFU590sWYo4BOhhQhs%3D",
                    "language": "fr"
                }
            ]
        }
    ]
}

Translate a specific document in a container

  • Specify "storageType": "File"
  • If you aren't using a system-assigned managed identity for authentication, make sure you've created source URL & SAS token for the specific blob/document (not for the container)
  • Ensure you've specified the target filename as part of the target URL – though the SAS token is still for the container.
  • Sample request below shows a single document getting translated into two target languages
{
    "inputs": [
        {
            "storageType": "File",
            "source": {
                "sourceUrl": "https://my.blob.core.windows.net/source-en/source-english.docx?sv=2019-12-12&st=2021-01-26T18%3A30%3A20Z&se=2021-02-05T18%3A30%3A00Z&sr=c&sp=rl&sig=d7PZKyQsIeE6xb%2B1M4Yb56I%2FEEKoNIF65D%2Fs0IFsYcE%3D"
            },
            "targets": [
                {
                    "targetUrl": "https://my.blob.core.windows.net/target/try/Target-Spanish.docx?sv=2019-12-12&st=2021-01-26T18%3A31%3A11Z&se=2021-02-05T18%3A31%3A00Z&sr=c&sp=wl&sig=AgddSzXLXwHKpGHr7wALt2DGQJHCzNFF%2F3L94JHAWZM%3D",
                    "language": "es"
                },
                {
                    "targetUrl": "https://my.blob.core.windows.net/target/try/Target-German.docx?sv=2019-12-12&st=2021-01-26T18%3A31%3A11Z&se=2021-02-05T18%3A31%3A00Z&sr=c&sp=wl&sig=AgddSzXLXwHKpGHr7wALt2DGQJHCzNFF%2F3L94JHAWZM%3D",
                    "language": "de"
                }
            ]
        }
    ]
}

Translate documents using a custom glossary

{
    "inputs": [
        {
            "source": {
                "sourceUrl": "https://myblob.blob.core.windows.net/source"
             },
            "targets": [
                {
                    "targetUrl": "https://myblob.blob.core.windows.net/target",
                    "language": "es",
                    "glossaries": [
                        {
                            "glossaryUrl": "https:// myblob.blob.core.windows.net/glossary/en-es.xlf",
                            "format": "xliff"
                        }
                    ]
                }
            ]
        }
    ]
}

Use code to submit Document Translation requests

Set up your coding Platform

  • Create a new project.
  • Replace Program.cs with the C# code shown below.
  • Set your endpoint, key, and container URL values in Program.cs.
  • To process JSON data, add Newtonsoft.Json package using .NET CLI.
  • Run the program from the project directory.

Important

For the code samples below, you'll hard-code your Shared Access Signature (SAS) URL where indicated. Remember to remove the SAS URL from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Managed Identity. See the Azure Storage security article for more information.

You may need to update the following fields, depending upon the operation:

  • endpoint
  • key
  • sourceURL
  • targetURL
  • glossaryURL
  • id (job ID)

Locating the id value

  • You'll find the job id in the POST method response Header Operation-Location URL value. The last parameter of the URL is the operation's job id:
Response header Result URL
Operation-Location https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0/batches/9dce0aa9-78dc-41ba-8cae-2e2f3c2ff8ec
  • You can also use a GET Jobs request to retrieve a Document Translation job id .

Translate documents


    using System;
    using System.Net.Http;
    using System.Threading.Tasks;
    using System.Text;


    class Program
    {

        static readonly string route = "/batches";

        private static readonly string endpoint = "https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0";

        private static readonly string key = "<YOUR-KEY>";

        static readonly string json = ("{\"inputs\": [{\"source\": {\"sourceUrl\": \"https://YOUR-SOURCE-URL-WITH-READ-LIST-ACCESS-SAS\",\"storageSource\": \"AzureBlob\",\"language\": \"en\"}, \"targets\": [{\"targetUrl\": \"https://YOUR-TARGET-URL-WITH-WRITE-LIST-ACCESS-SAS\",\"storageSource\": \"AzureBlob\",\"category\": \"general\",\"language\": \"es\"}]}]}");

        static async Task Main(string[] args)
        {
            using HttpClient client = new HttpClient();
            using HttpRequestMessage request = new HttpRequestMessage();
            {

                StringContent content = new StringContent(json, Encoding.UTF8, "application/json");

                request.Method = HttpMethod.Post;
                request.RequestUri = new Uri(endpoint + route);
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);
                request.Content = content;

                HttpResponseMessage  response = await client.SendAsync(request);
                string result = response.Content.ReadAsStringAsync().Result;
                if (response.IsSuccessStatusCode)
                {
                    Console.WriteLine($"Status code: {response.StatusCode}");
                    Console.WriteLine();
                    Console.WriteLine($"Response Headers:");
                    Console.WriteLine(response.Headers);
                }
                else
                    Console.Write("Error");

            }

        }

    }

Get file formats

Retrieve a list of supported file formats. If successful, this method returns a 200 OK response code.


using System;
using System.Net.Http;
using System.Threading.Tasks;


class Program
{


    private static readonly string endpoint = "https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0";

    static readonly string route = "/documents/formats";

    private static readonly string key = "<YOUR-KEY>";

    static async Task Main(string[] args)
    {

        HttpClient client = new HttpClient();
            using HttpRequestMessage request = new HttpRequestMessage();
            {
                request.Method = HttpMethod.Get;
                request.RequestUri = new Uri(endpoint + route);
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);


                HttpResponseMessage response = await client.SendAsync(request);
                string result = response.Content.ReadAsStringAsync().Result;

                Console.WriteLine($"Status code: {response.StatusCode}");
                Console.WriteLine($"Response Headers: {response.Headers}");
                Console.WriteLine();
                Console.WriteLine(result);
            }
}

Get job status

Get the current status for a single job and a summary of all jobs in a Document Translation request. If successful, this method returns a 200 OK response code.


using System;
using System.Net.Http;
using System.Threading.Tasks;


class Program
{


    private static readonly string endpoint = "https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0";

    static readonly string route = "/batches/{id}";

    private static readonly string key = "<YOUR-KEY>";

    static async Task Main(string[] args)
    {

        HttpClient client = new HttpClient();
            using HttpRequestMessage request = new HttpRequestMessage();
            {
                request.Method = HttpMethod.Get;
                request.RequestUri = new Uri(endpoint + route);
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);


                HttpResponseMessage response = await client.SendAsync(request);
                string result = response.Content.ReadAsStringAsync().Result;

                Console.WriteLine($"Status code: {response.StatusCode}");
                Console.WriteLine($"Response Headers: {response.Headers}");
                Console.WriteLine();
                Console.WriteLine(result);
            }
}

Get document status

Brief overview

Retrieve the status of a specific document in a Document Translation request. If successful, this method returns a 200 OK response code.


using System;
using System.Net.Http;
using System.Threading.Tasks;


class Program
{


    private static readonly string endpoint = "https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0";

    static readonly string route = "/{id}/document/{documentId}";

    private static readonly string key = "<YOUR-KEY>";

    static async Task Main(string[] args)
    {

        HttpClient client = new HttpClient();
            using HttpRequestMessage request = new HttpRequestMessage();
            {
                request.Method = HttpMethod.Get;
                request.RequestUri = new Uri(endpoint + route);
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);


                HttpResponseMessage response = await client.SendAsync(request);
                string result = response.Content.ReadAsStringAsync().Result;

                Console.WriteLine($"Status code: {response.StatusCode}");
                Console.WriteLine($"Response Headers: {response.Headers}");
                Console.WriteLine();
                Console.WriteLine(result);
            }
}

Delete job

Brief overview

Cancel currently processing or queued job. Only documents for which translation hasn't started will be canceled.


using System;
using System.Net.Http;
using System.Threading.Tasks;


class Program
{


    private static readonly string endpoint = "https://<NAME-OF-YOUR-RESOURCE>.cognitiveservices.azure.com/translator/text/batch/v1.0";

    static readonly string route = "/batches/{id}";

    private static readonly string key = "<YOUR-KEY>";

    static async Task Main(string[] args)
    {

        HttpClient client = new HttpClient();
            using HttpRequestMessage request = new HttpRequestMessage();
            {
                request.Method = HttpMethod.Delete;
                request.RequestUri = new Uri(endpoint + route);
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);


                HttpResponseMessage response = await client.SendAsync(request);
                string result = response.Content.ReadAsStringAsync().Result;

                Console.WriteLine($"Status code: {response.StatusCode}");
                Console.WriteLine($"Response Headers: {response.Headers}");
                Console.WriteLine();
                Console.WriteLine(result);
            }
}

Content limits

The table below lists the limits for data that you send to Document Translation.

Attribute Limit
Document size ≤ 40 MB
Total number of files. ≤ 1000
Total content size in a batch ≤ 250 MB
Number of target languages in a batch ≤ 10
Size of Translation memory file ≤ 10 MB

Document Translation can't be used to translate secured documents such as those with an encrypted password or with restricted access to copy content.

Troubleshooting

Common HTTP status codes

HTTP status code Description Possible reason
200 OK The request was successful.
400 Bad Request A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common issue is a header that is too long.
401 Unauthorized The request isn't authorized. Check to make sure your key or token is valid and in the correct region. When managing your subscription on the Azure portal, make sure you're using the Translator single-service resource not the Cognitive Services multi-service resource.
429 Too Many Requests You've exceeded the quota or rate of requests allowed for your subscription.
502 Bad Gateway Network or server-side issue. May also indicate invalid headers.

Learn more

Next steps