FormRecognizer REST API urlSource formatting issue

Sachin S 31 Reputation points
2022-05-20T18:57:59.997+00:00

Hello, I'm trying to run form recognizer3.0 prebuilt model on an invoice in pdf file stored in Blob store. The file name in blob store container is as follows:

I need to pass URL to the file to the form recognizer API in the Body of Rest API call but it fails with error message below:
{
"errorCode": "2108",
"message": "{\"error\":{\"code\":\"InvalidRequest\",\"message\":\"Invalid request.\",\"innererror\":{\"code\":\"InvalidContent\",\"message\":\"The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats.\"}}}",
"failureType": "UserError",
"target": "Form Recognizer - POST",
"details": []
}

Azure Portal: E PF Pas (Cleaning Services) Pty Ltd 4844 tkt 1115212 JAM.pdf
Downloaded from blob store via Azure portal: E%20PF%20Pas%20(Cleaning%20Services)%20Pty%20Ltd%204844%20tkt%201115212%20JAM.pdf

I tried the below file path but all of the below fail with above error message, I tried renaming the file to one without spaces and alphanumeric character and it works fine.
Does anyone know how to correctly format the above file name to be passed to the Form Recognizer? I'm trying to call the FR from Azure data factory.

{"urlSource": "https://\

Azure AI Document Intelligence
{count} vote

7 answers

Sort by: Most helpful
  1. Victorien FB 0 Reputation points
    2024-07-08T14:46:02.86+00:00

    Hi,

    I have the same problem with url + sas token :

    HttpResponseError: (InvalidRequest) Invalid request.
    Code: InvalidRequest
    Message: Invalid request.
    Inner error: {
        "code": "InvalidContent",
        "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats."
    }
    

    it works without sas token (but my blobs are accessible to the whole internet...)

    Here's the format url with sas :

    https://myazurestorage.blob.core.windows.net/mycontainer/0b735a3e-74a2-4758-a058-f874a276bdaf-myfile.pdf?st=2024-07-08T14%3A27%3A39Z&se=2024-07-09T14%3A27%3A39Z&sp=r&sv=2024-05-04&sr=c&sig=f5OFrzNFY/vfxF9VukGjfPGuLngiC2Zx6fj%2BZUugPhk%3D
    

    What's wrong ? Does anyone have a solution ?

    Thanks,

    0 comments No comments

  2. ericOnline 21 Reputation points
    2024-09-26T16:03:27.2766667+00:00

    I think this issue is because the DocIntel Service is responding with an incorrect Status Code...

    This error is incorrectly returning an HttpResponseError:

    Response status: 400 Response headers: 'Content-Length': '221' 'Content-Type': 'application/json; charset=utf-8' 'ms-azure-ai-errorcode': 'REDACTED' 'x-ms-error-code': 'InvalidRequest' 'x-envoy-upstream-service-time': 'REDACTED' 'apim-request-id': 'REDACTED' 'Strict-Transport-Security': 'REDACTED' 'x-content-type-options': 'REDACTED' 'x-ms-region': 'REDACTED' 'Date': 'Thu, 26 Sep 2024 15:29:16 GMT'

    Result: Failure Exception: HttpResponseError: (InvalidRequest) Invalid request. Code: InvalidRequest Message: Invalid request. Inner error: { "code": "InvalidContent", "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats." }

    Mozilla says Status 400 is an incorrect client request.

    But its not a client request issue... The DocIntel API should respond with a Status 403 because the service does is not permitted to read from the Storage Account.

    The issue was that the DocIntel service needed Storage Blob Data Reader RBAC Role Granted on the Storage Account. As soon as I granted this Role, the error went away.

    Please address this issue.

    Thank you

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.