PDF with Multiple invoices

Question

PDF with Multiple invoices

MN, Yogesh 0

I am having multiple invoices in single PDF. Is there any way to split the pdf based on invoices or is there any way to identify if the PDF is having multiple invoices or single invoice(Language: Python).

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-17T07:52:32.1033333+00:00
@MN, Yogesh Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Could you please elaborate your ask ?

Please share your requirement and your use case details.

This will help us in assisting your better. Awaiting your reply.
MN, Yogesh 0 Reputation points

2024-04-17T09:54:40.3433333+00:00

@navba-MSFT I am having a pdf file which contains 10 scanned invoices, i what to split the pdf file into 10 pdf file based on invoice or i need to identify that the PDF file contains multiple invoices
navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-17T10:12:22.11+00:00

@MN, Yogesh Thanks for getting back and clarifying. Are you using Python SDK or calling REST API directly ?
navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-19T07:13:27.69+00:00

@MN, Yogesh Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

1 answer

Your answer

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-17T07:52:32.1033333+00:00

@MN, Yogesh Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Could you please elaborate your ask ?

Please share your requirement and your use case details.

This will help us in assisting your better. Awaiting your reply.
MN, Yogesh 0 Reputation points

2024-04-17T09:54:40.3433333+00:00

@navba-MSFT I am having a pdf file which contains 10 scanned invoices, i what to split the pdf file into 10 pdf file based on invoice or i need to identify that the PDF file contains multiple invoices
navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-17T10:12:22.11+00:00

@MN, Yogesh Thanks for getting back and clarifying. Are you using Python SDK or calling REST API directly ?
navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2024-04-19T07:13:27.69+00:00

@MN, Yogesh Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

Answer 1

@MN, Yogesh Regarding your question, to split a multi-page PDF file into single pages, each containing one independent invoice, you can use Azure’s data processing capabilities. After splitting the file, you can send the location of the single-page PDF file to AI Document Intelligence for processing.

The Document Intelligence invoice model can extract key information such as customer name, billing address, due date, and amount due, and returns a structured JSON data representation. This can help you identify if a PDF file contains multiple invoices.

More info about DI invoice model: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-invoice?view=doc-intel-4.0.0

For development, Document Intelligence supports various tools, applications, and libraries such as Document Intelligence Studio, REST API, and SDKs for C#, Python, Java, and JavaScript.

Python sample using pre-built invoice model: https://learn.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python#using-prebuilt-models

Please note that the features and processes may change based on user feedback as Document Intelligence is in active development. For best results, provide one clear photo or high-quality scan per document.

Automate PDF processing: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/automate-pdf-forms-processing

This sample demonstrates how to use GPT-4 Vision to extract structured JSON data from PDF documents, such as invoices, using the Azure OpenAI Service.

I hope this helps! If you have any more questions, feel free to ask

Share via

PDF with Multiple invoices

1 answer

Your answer