How can I extract text from a PDF file without printed/image text results in Azure Document Intelligence?

ekkarat adsawinnawanawa 20 Reputation points
2024-01-09T09:02:33.8933333+00:00

How can I extract text from a PDF file without printed/image text results?

I extracted PDF files by using Document Intelligence, the prebuilt-layout model. My PDF file includes text and background images. The results are really good except for some files that contain printed/image text in the background image because it also extracted text from the background and I don't need text from the background.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,102 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 15,330 Reputation points Microsoft External Staff Moderator
    2024-01-09T14:51:13.01+00:00

    Hi @ekkarat adsawinnawanawa,

    Thank you for reaching out to Microsoft Q&A forum!

    You can use Custom model to extract text from a PDF file without printed/image text results. Custom models are trained to extract distinct data from forms and documents, specific to your use cases.

    Please look into the official documentations of custom model:

    Custom model overview & Build and train a custom model

    Hope this helps. Thank you.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.