Make Document Intelligence ignore redundant info

Thibault Verlinde 120 Reputation points
2024-03-06T08:58:49.2166667+00:00

Hiya

I'm currently working a program in C# that tries to index documents and put them in a search service index.
These documents are pdfs and word files, with text and screenshots explaining the text. All these documents go through a Document Analysis instance for interpreting the content of these files.

I was wondering, is there a possibility to ignore only the screenshots, since the instance picks up words out of these screenshots. These words are pretty much completely redundant.

Thanks in advance!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,118 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,631 questions
Developer technologies | C#
0 comments No comments
{count} votes

Accepted answer
  1. Monalla-MSFT 13,071 Reputation points Moderator
    2024-03-06T14:16:39.8366667+00:00

    @Thibault Verlinde - Thanks for reaching out to us.

    It is possible to ignore screenshots when analyzing documents using a Document Analysis instance. One way is to use an object detection model to identify and classify images within the document. Once the images are identified, they can be excluded from the analysis.

    Another approach is to use a layout analysis model to extract text, tables, and other elements while ignoring images. Azure’s Document Intelligence (formerly Form Recognizer) offers layout analysis capabilities that can be used for this purpose

    Here is the document for more reference: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-layout?view=doc-intel-4.0.0

    https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python

    Hope this helps. and please feel free to reach out if you have any further questions.


    If the above response was helpful, please feel free to "Accept as Answer" and click "Yes" so it can be beneficial to the community.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.