404 Pdf analyze result does not exist when trying to receive Searchable PDF on large pdfs

Eamon Miller 20 Reputation points
2024-09-29T19:52:12.8766667+00:00

I am constantly getting a 404 error when trying to access the searchable pdf after doing an analysis with prebuilt-read. this only happens with larger pdfs about 50 pages and up. also it will work once in a while on a large pdf but usually returns a 404 but if it is a small pdf it works every time. i have also tried adding a retry where it will try again after 30 seconds incase the problem was the pdf not being ready yet but that did not work.

Parsed error: { code: 'NotFound', message: 'Pdf analyze result does not exist.' }
Response status: 404
Response headers: Object [AxiosHeaders] {
  'content-length': '66',
  'content-type': 'application/json; charset=utf-8',
  'x-envoy-upstream-service-time': '3137',
  'apim-request-id': '6d044955-e6c2-4285-97ed-07989a557489',
  'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
  'x-content-type-options': 'nosniff',
  'x-ms-region': 'East US',
  date: 'Sun, 29 Sep 2024 19:38:50 GMT'
}

I dont think there is a problem with my code because i was not having this problem until recently and i have not changed the code.

    const pdfUrl = `${azureEndpoint}/documentintelligence/documentModels/prebuilt-read/analyzeResults/${resultId}/pdf?api-version=2024-07-31-preview`;
    console.log(`Requesting searchable PDF from Azure DI. Full URL: ${pdfUrl}`);

    const pdfResponse = await axios.get(
      `${azureEndpoint}/documentintelligence/documentModels/prebuilt-read/analyzeResults/${resultId}/pdf?api-version=2024-07-31-preview`,
      {
        headers: {
          'Ocp-Apim-Subscription-Key': azureKey,
          'Accept': 'application/pdf'
        },
        responseType: 'arraybuffer'
      }
    );

    // Convert the response data to a string if it's not a PDF
    const responseData = Buffer.from(pdfResponse.data);
    if (responseData.toString().startsWith('{')) {
      const errorMessage = JSON.parse(responseData.toString());
      console.error('Error retrieving searchable PDF:', errorMessage);
      throw new Error(`Failed to retrieve searchable PDF: ${errorMessage.message}`);
    }

    const searchablePdfBuffer = Buffer.from(pdfResponse.data, 'binary');

    await fs.writeFile(outputPdfPath, searchablePdfBuffer);
    console.log(`Searchable PDF saved at ${outputPdfPath}`);
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,667 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 8,775 Reputation points Microsoft Vendor
    2024-09-30T09:49:24.2733333+00:00

    Hi @Eamon Miller,

    I'm glad to hear that your issue has been resolved. And thanks for sharing the information, which might be beneficial to other community members reading this thread as solution. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", so I'll repost your response to an answer in case you'd like to accept the answer. This will help other users who may have a similar query find the solution more easily.

    Question: 404 Pdf analyze result does not exist when trying to receive Searchable PDF on large pdfs.

    Solution: It turns out it was a problem with formatting of my PDFs. The GET request to download the searchable PDF seems to return a 404 error when the uploaded PDF is certain formats. Using pdf-lib to create a new PDF and copying all of the pages into that PDF before uploading it to be analyzed solved the issue.

    If you have any further questions or concerns, please don't hesitate to ask. We're always here to help.


    Do click Accept Answer and Yes for was this answer helpful.


1 additional answer

Sort by: Most helpful
  1. Eamon Miller 20 Reputation points
    2024-09-30T01:43:45.37+00:00

    It turns out it was a problem with formatting of my PDFs. The GET request to download the searchable PDF seems to return a 404 error when the uploaded PDF is certain formats. Using pdf-lib to create a new PDF and copying all of the pages into that PDF before uploading it to be analyzed solved the issue.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.