how to fix Vision Layout not getting all the correct results from nearly identical pdf's

JB 1 Reputation point
2025-06-03T20:36:19.74+00:00

I am submitting two PDF's to Vision Layout endpoint so I can extract the check marked Utilities located from this section of the pdf.
User's image

I first split the file and remove the first page. The other two are sent separately through Vision The first page works flawlessly and returns Water as 'Selected'.

User's image

"content": "Water , St. lights Storm\n:selected:\n:unselected: :unselected:"

The second page comes through, but the output doesn't see this correctly. We have figured out that it is proximity of the text above that is messing it up.

Page 1
User's image

Versus Page 2

User's image

I'm sending the results to a function to return the label of any checked values but how do i get Vision to correctly process the file?

I think i have a work around in this instance by redacting that text area before sending it to Vision but that won't help when I get to processing other pdf's.

Here is the file I am trying to process.
184 King St - 20251915589.pdf

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
292 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Jerald Felix 1,305 Reputation points
    2025-06-04T02:30:42.58+00:00

    Hello JB,

    I understand you're encountering issues with the Vision layout not capturing all the correct elements. While the original Microsoft Q&A thread you referenced isn't accessible, I can provide some general guidance that might help resolve your problem.

    1. Check Layout Settings: Ensure that the layout settings for your components are correctly configured. Misconfigured settings can lead to elements not appearing as expected.
    2. Verify Element Visibility: Sometimes, elements might be hidden due to visibility settings or layering issues. Double-check that all elements are set to be visible and are not obscured by other components.
    3. Inspect Anchoring and Docking: Improper anchoring or docking can cause elements to be positioned incorrectly or not appear at all. Make sure that each element is anchored or docked appropriately within the layout.
    4. Update or Refresh the Layout: If elements are added dynamically, the layout might not update automatically. Consider forcing a layout refresh or update after adding new elements to ensure they are rendered correctly.
    5. Review Error Logs: Check for any error messages or logs that might indicate issues with rendering or loading elements within the layout.

    Best Regards,

    Jerald Felix

    0 comments No comments

  2. Ravada Shivaprasad 460 Reputation points Microsoft External Staff Moderator
    2025-06-05T00:05:17.67+00:00

    Hi JB

    The issue you're encountering with Azure's Vision Layout service misinterpreting check-marked utilities on the second page of your PDF stems from layout interference—specifically, the proximity of unrelated text above the target section. While the first page is processed correctly, the second page fails due to this spatial overlap, which disrupts the model's ability to accurately associate checkboxes with their corresponding labels.

    To address this, the first step is to identify the problematic area on the second page where the interference occurs. This typically involves analyzing the layout output from Vision to pinpoint where the text blocks are too close or overlapping. Once identified, a practical workaround is to redact or remove the interfering text before submitting the page to the Vision Layout service. This can be done programmatically using PDF processing libraries like PyMuPDF or PDFPlumber.

    Optimizing the document layout is also crucial. This includes removing unnecessary elements, ensuring consistent spacing, and possibly reformatting the document to isolate form fields more clearly. After redaction and optimization, re-submit the page to the Vision Layout service and verify whether the check-marked utilities are now correctly extracted.

    While redaction is a viable short-term fix, for broader scalability across varied documents, consider implementing a preprocessing pipeline that dynamically detects and isolates form sections based on layout heuristics or visual zoning. This approach can help maintain accuracy even when document structures vary.

    For more on Azure's Vision Layout capabilities and best practices, you can refer to the official documentation: Azure AI Vision Documentation Hub

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.