Make Document Intelligence ignore redundant info

ThibaultVerlinde-4578 120 Reputation points
2024-03-06T08:58:49.2166667+00:00

Hiya

I'm currently working a program in C# that tries to index documents and put them in a search service index.
These documents are pdfs and word files, with text and screenshots explaining the text. All these documents go through a Document Analysis instance for interpreting the content of these files.

I was wondering, is there a possibility to ignore only the screenshots, since the instance picks up words out of these screenshots. These words are pretty much completely redundant.

Thanks in advance!

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,193 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,337 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,348 questions
0 comments No comments
{count} votes

Accepted answer
  1. Monalla-MSFT 11,551 Reputation points
    2024-03-06T14:16:39.8366667+00:00

    @Thibault Verlinde - Thanks for reaching out to us.

    It is possible to ignore screenshots when analyzing documents using a Document Analysis instance. One way is to use an object detection model to identify and classify images within the document. Once the images are identified, they can be excluded from the analysis.

    Another approach is to use a layout analysis model to extract text, tables, and other elements while ignoring images. Azure’s Document Intelligence (formerly Form Recognizer) offers layout analysis capabilities that can be used for this purpose

    Here is the document for more reference: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-layout?view=doc-intel-4.0.0

    https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python

    Hope this helps. and please feel free to reach out if you have any further questions.


    If the above response was helpful, please feel free to "Accept as Answer" and click "Yes" so it can be beneficial to the community.


0 additional answers

Sort by: Most helpful