Copy-protected PDF files

mmmm 20 Reputation points
2023-04-13T11:08:52.1433333+00:00

Hello,

I have copy protected PDF files. I can open them, but when I copy and paste I get obfuscated text.

Is there a way to transform them into normal PDFs, then input them into Form Recognizer Studio? Or do I have to convert them to images using pdf2image (can be different package), and only then use Form Recognizer Studio?

Any insights or suggestions on this matter would be greatly appreciated. Thank you in advance! That's the first time I've come across protected PDFs.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
0 comments No comments
{count} votes

Accepted answer
  1. VasaviLankipalle-MSFT 18,676 Reputation points Moderator
    2023-04-13T22:31:30.11+00:00

    Hi @mmmm , Thanks for using Microsoft Q&A Platform.

    I haven't worked on a protected PDF file. If the PDF files are copy-protected, I believe you may not be able to extract text from them directly.

    I would recommend you try a PDF conversion tool that includes such as Adobe Acrobat Pro or ABBYY FineReader Pdf to remove the copy protection from the PDFs, and then use Form Recognizer Studio to extract text from the PDFs.

    You can also try using a PDF converter to convert the PDFs to an image format such as JPEG/JPG, PNG, BMP, and TIFF formats. and then use Form Recognizer Studio to extract text from the images.

    You can refer to this document to know more about Form Recognizer input file requirements: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-general-document?view=form-recog-3.0.0#input-requirements

    I hope this helps.

    Regards,

    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.