Azure Document Translation cannot translate pdf files

Creid Lee 1 Reputation point
2021-08-27T10:33:12.62+00:00

I am developing an web app that can translate pdf files by using Azure Document Translation and facing a blocking issue as follows.

Each translation request for a pdf file is submitted and then translation status is always failed with document status constantly indicating invalid document due to corruption or unsupported type/extension. The following is examples of the request and corresponding unsuccessful response.

request:
{
"inputs": [
{
"storageType": "File",
"source": {
"sourceUrl": "https://my.blob.core.windows.net/public-file/a95e8311b924453ea18fd735cdb9535c.pdf?sv=2020-04-08&st=2021-08-27T09%3A13%3A18Z&se=2021-08-28T09%3A13%3A18Z&sr=b&sp=r&sig=cTztb3rwYzFreC9fh81IfVFefwAJFGfi438fCVomcSM%3D"
},
"targets": [
{
"targetUrl": "https://my.blob.core.windows.net/public-file/a95e8311b924453ea18fd735cdb9535c_translated.pdf?sp=wl&st=2021-08-27T09:14:41Z&se=2021-08-28T09:14:41Z&sv=2020-08-04&sr=c&sig=D4RrSiL0bGdy%2BXNCUYEm94h0CHzNCv1%2FN%2B7nsDxRcY0%3D",
"language": "zh-Hans"
}
]
}
]
}

response:
{
"value": [
{
"sourcePath": "https://my.blob.core.windows.net/public-file/a95e8311b924453ea18fd735cdb9535c.pdf",
"createdDateTimeUtc": "2021-08-27T09:56:58.5368104Z",
"lastActionDateTimeUtc": "2021-08-27T09:57:07.3741074Z",
"status": "Failed",
"to": "zh-Hans",
"error": {
"code": "InvalidRequest",
"message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension.",
"target": "Document",
"innerError": {
"code": "InvalidDocument",
"message": "Document failed during checking validity. This may be caused by corruption or unsupported type/extension."
}
},
"progress": 0,
"id": "273622bd-835c-4946-9798-fd8f19f6bbf2",
"characterCharged": 0
}
]
}

I checked every single pdf file uploaded to Azure Blob storage. Each is openable and no one is found corrupt. I tried with other format files like text or Word files. They can be translated while pdf files seem to be only format that cannot. Is the service temporarily unavailable for translating pdf files or are there any required paramters for pdf translation missing in the request?

Azure AI Translator
Azure AI Translator
An Azure service to easily conduct machine translation with a simple REST API call.
489 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 48,916 Reputation points Microsoft Employee Moderator
    2021-08-27T11:49:53.06+00:00

    @Creid Lee From the service perspective, I tested with one of my PDF documents and the file is translated correctly.

    127127-image.png

    I feel the issue might be with the shared access signature(SAS) because in the response it seems to indicate the source file path and did not seem to pickup the signature correctly. If I lookup your URL the following seems incorrect:

    ?sv=2020-04-08&st=2021-08-27T09%3A13%3A18Z&se=2021-08-28T09%3A13%3A18Z  
    

    Shouldn't it be the following:

    ?sv=2020-04-08&st=2021-08-27T09:13:18Z&se=2021-08-28T09:13:18Z  
    

    Seems like ':' is replaced with %3A in UTF format

    The target URL though seems to have got it right.

    Could you please re-check the signature and try the scenario again?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.