How to use Azure Cognitive Services to create different PDFs based on types from a single PDF?

Question

How to use Azure Cognitive Services to create different PDFs based on types from a single PDF?

Snehal Shah 0

Hi experts,

I have several multi-page PDFs from multiple companies, each contains bank statements, purchase invoices from various suppliers and sales invoices. I would like to segregate each type into a separate PDF using Azure Cognitive Services. Can you please suggest the logic and services required (such as Form Recognizer) to achieve this task?

Thanks

Rishad M 21 Reputation points

2024-02-15T09:15:19.4533333+00:00

@Snehal Shah ,Even looking for a similar use case, in my situation, I receive a consolidated payslip in PDF format from a company. Now, I need to split this document into individual payslips. Have you been able to accomplish this using Azure Document Intelligence?

3 answers

Your answer

Rishad M 21 Reputation points

2024-02-15T09:15:19.4533333+00:00

@Snehal Shah ,Even looking for a similar use case, in my situation, I receive a consolidated payslip in PDF format from a company. Now, I need to split this document into individual payslips. Have you been able to accomplish this using Azure Document Intelligence?

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hi @Snehal Shah

follow these steps to segregate each type into a separate PDF using Azure Cognitive Services

Use Azure Form Recognizer to extract text and key-value pairs from PDFs.
Implement document classification logic based on the extracted data.
Split the PDFs into separate files for each document type.
Save the segregated PDFs into separate folders.

Documentation link for Azure Form Recognizer:

Azure Form Recognizer Documentation

if this helps kindly accept the asnwer, for assiatance ping here thanks much

Snehal Shah 0 Reputation points

2023-07-26T04:46:18.1733333+00:00

For Point No. 3 & 4 do I need to use any other service or it will be part of Form Recognizer service?
Snehal Shah 0 Reputation points

2023-07-26T05:08:44.27+00:00

For Point No. 3 & 4 also part of Form Recognizer service or do I need some other service to split & generate PDF?
Azar 29,520 Reputation points MVP Volunteer Moderator

2023-07-28T09:21:50.1466667+00:00

You dont need to use a diff service snehal

Answer 2

Hi @Snehal Shah ,

Adding to Azar's response - I want to point out that Azure Form Recognizer is now called "Azure AI Document Intelligence ".

For your scenario, consider using: Document Intelligence layout model

Document Intelligence layout model is an advanced machine-learning based document analysis API available in the Document Intelligence cloud. It enables you to take documents in various formats and return structured data representations of the documents. It combines an enhanced version of our powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract text, tables, selection marks, and document structure.

Hope that helps

Best,

Grace

Answer 3

Kalyan Chakravarthi Bondala 0

Hi @Azar Azar
I have a use case when I have pdf with 20 purchase orders, each purchase orders has a table of linetimes(description, externalItemid, quantity), also for each purchase order I have fields like shipto address, PO Number etc.,
How do I Extract those orders.

Can I Split the PDF into individua orders then train.
Is there a way to return list of orders through JSON.

Have researched, but did not find a solution Yet.

Share via

How to use Azure Cognitive Services to create different PDFs based on types from a single PDF?

3 answers

Your answer