FormRecognizer REST API urlSource formatting issue

Sachin S 26 Reputation points
2022-05-20T18:57:59.997+00:00

Hello, I'm trying to run form recognizer3.0 prebuilt model on an invoice in pdf file stored in Blob store. The file name in blob store container is as follows:

I need to pass URL to the file to the form recognizer API in the Body of Rest API call but it fails with error message below:
{
"errorCode": "2108",
"message": "{\"error\":{\"code\":\"InvalidRequest\",\"message\":\"Invalid request.\",\"innererror\":{\"code\":\"InvalidContent\",\"message\":\"The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats.\"}}}",
"failureType": "UserError",
"target": "Form Recognizer - POST",
"details": []
}

Azure Portal: E PF Pas (Cleaning Services) Pty Ltd 4844 tkt 1115212 JAM.pdf
Downloaded from blob store via Azure portal: E%20PF%20Pas%20(Cleaning%20Services)%20Pty%20Ltd%204844%20tkt%201115212%20JAM.pdf

I tried the below file path but all of the below fail with above error message, I tried renaming the file to one without spaces and alphanumeric character and it works fine.
Does anyone know how to correctly format the above file name to be passed to the Form Recognizer? I'm trying to call the FR from Azure data factory.

{"urlSource": "https://<accountname>.blob.core.windows.net/<containername>/E%20PF%20Pas%20%28Cleaning%20Services%29%20Pty%20Ltd%204844%20tkt%201115212%20JAM.pdf" }

{"urlSource": "https://<accountname>.blob.core.windows.net/<containername>/E%20PF%20Pas%20(Cleaning%20Services)%20Pty%20Ltd%204844%20tkt%201115212%20JAM.pdf" }

{"urlSource": "https://<accountname>.blob.core.windows.net/<containername>/E PF Pas (Cleaning Services) Pty Ltd 4844 tkt 1115212 JAM.pdf" }

Thank you!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,360 questions
{count} vote

5 answers

Sort by: Most helpful
  1. Nwankwoh, Dozie 1 Reputation point
    2022-06-01T16:43:58.137+00:00

    I currently have this issue. Have you determined a solution @Sachin S ?

    Has anyone else determined a solution?

    Thank you

    0 comments No comments

  2. Sachin S 26 Reputation points
    2022-06-03T13:14:20.51+00:00

    No. I had to remove spaces from the file name before invoking the form recognizer API.

    0 comments No comments

  3. Anonymous
    2023-08-02T16:04:46.26+00:00

    Did you find solution for this answer?

    0 comments No comments

  4. Anand Sridharan 0 Reputation points
    2023-09-12T01:37:20.8966667+00:00

    Hi Sachin,

    I had a similar issue. As per the documentation in Azure, this could be because the url is invalid or inaccessible. In my case it was inaccessible and I had to open up the storage container from "private(anonymous access") to "anonymous read access". Refer to containeraccess.jpg

    Thanks,

    Anand.

    0 comments No comments

  5. Keith 0 Reputation points
    2023-10-03T02:10:00.18+00:00

    I encountered a similar problem, but mine had an unusual twist. While some PDFs functioned correctly from a storage account container, others did not.

    To resolve the problem, I found a workaround by placing the seemingly "corrupt" PDFs into the same storage account container connected to my custom model training. Everything appears to be functioning smoothly now.

    I attempted renaming and adjusting public access settings, but neither of these solutions proved effective for me.

    To clarify put the broken pdf's in the same container as the ".pdf.labels.json", and "pdf.ocr.json". These get created when you train a custom model.

    0 comments No comments