@MN, Yogesh Regarding your question, to split a multi-page PDF file into single pages, each containing one independent invoice, you can use Azure’s data processing capabilities. After splitting the file, you can send the location of the single-page PDF file to AI Document Intelligence for processing.
The Document Intelligence invoice model can extract key information such as customer name, billing address, due date, and amount due, and returns a structured JSON data representation. This can help you identify if a PDF file contains multiple invoices.
More info about DI invoice model: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-invoice?view=doc-intel-4.0.0
For development, Document Intelligence supports various tools, applications, and libraries such as Document Intelligence Studio, REST API, and SDKs for C#, Python, Java, and JavaScript.
Python sample using pre-built invoice model: https://learn.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python#using-prebuilt-models
Please note that the features and processes may change based on user feedback as Document Intelligence is in active development. For best results, provide one clear photo or high-quality scan per document.
Automate PDF processing: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/automate-pdf-forms-processing
This sample demonstrates how to use GPT-4 Vision to extract structured JSON data from PDF documents, such as invoices, using the Azure OpenAI Service.
I hope this helps! If you have any more questions, feel free to ask