Azure Document Analysis - Pre-Built Invoice model

Question

Azure Document Analysis - Pre-Built Invoice model

Maddu, Murali P. (TR Technology) 0

Hi,

We are thinking to use Azure Document Intelligence pre built invoice model and one of our use case is we can have a single document with multiple different invoices pages with different invoice id's but the analysis output fields is extracting only last page invoice id but not the previous page invoice id's, what I'm looking for is treating each page as separate document share the results for multi page invoice document, may I know if that is possible?

Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-12T03:04:17.6533333+00:00

Hi Maddu, Murali P. (TR Technology)

Following up to see if the above answer was helpful.

Thank You.
Maddu, Murali P. (TR Technology) 0 Reputation points

2025-03-12T10:50:00.58+00:00

that's good to know @Saideep Anchuri and thanks for suggestion but why we need to train the model, can't we use the existing pre-built model if we supply the single page invoices?
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-12T12:57:20.7566667+00:00

Hi Maddu, Murali P. (TR Technology)

you may not necessarily need to train a custom model. Azure offers pre-built models, such as the Azure Form Recognizer, which is specifically designed for extracting data from documents like invoices, receipts, and forms. These models are ready to use and don't require additional training.

The key benefit of using pre-built models is that they're optimized for common document types. If your invoices follow standard formats, a pre-built model could be sufficient. You simply upload the documents, and the model extracts the necessary information like invoice numbers, dates, totals, etc.

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-13T03:56:14.8333333+00:00

Hi Maddu, Murali P. (TR Technology)

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

1 answer

Your answer

Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-12T03:04:17.6533333+00:00

Hi Maddu, Murali P. (TR Technology)

Following up to see if the above answer was helpful.

Thank You.
Maddu, Murali P. (TR Technology) 0 Reputation points

2025-03-12T10:50:00.58+00:00

that's good to know @Saideep Anchuri and thanks for suggestion but why we need to train the model, can't we use the existing pre-built model if we supply the single page invoices?
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-12T12:57:20.7566667+00:00

Hi Maddu, Murali P. (TR Technology)

you may not necessarily need to train a custom model. Azure offers pre-built models, such as the Azure Form Recognizer, which is specifically designed for extracting data from documents like invoices, receipts, and forms. These models are ready to use and don't require additional training.

The key benefit of using pre-built models is that they're optimized for common document types. If your invoices follow standard formats, a pre-built model could be sufficient. You simply upload the documents, and the model extracts the necessary information like invoice numbers, dates, totals, etc.

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-03-13T03:56:14.8333333+00:00

Hi Maddu, Murali P. (TR Technology)

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

Answer 1

Hi Maddu, Murali P. (TR Technology)

The Azure Document Intelligence prebuilt invoice model processes invoice and extracts key information, but it is designed to handle one invoice per page.

Here are some steps:

Please split the pdfs to single pages using a python library and train the model.
Please use custom neural model for training.

import PyPDF2
 
def split_pdf(input_pdf_path, output_folder):
    # Open the input PDF file
    with open(input_pdf_path, 'rb') as input_pdf_file:
        # Read the PDF file
        reader = PyPDF2.PdfFileReader(input_pdf_file)
 
        # Loop through all the pages in the PDF
        for page_number in range(reader.getNumPages()):
            # Create a new PDF writer
            writer = PyPDF2.PdfFileWriter()
 
            # Add the current page to the writer
            writer.addPage(reader.getPage(page_number))
 
            # Create the output PDF file path
            output_pdf_path = f'{output_folder}/page_{page_number + 1}.pdf'
 
            # Write the single-page PDF to the output file
            with open(output_pdf_path, 'wb') as output_pdf_file:
                writer.write(output_pdf_file)
 
            print(f'Saved: {output_pdf_path}')
 
# Usage example
input_pdf_path = 'C:/Users/v-manmohanty/Downloads/sample-pdf-files-sample3.pdf'
output_folder = 'C:/Users/v-manmohanty/Documents/Doctest'
split_pdf(input_pdf_path, output_folder)

Thank You.

Share via

Azure Document Analysis - Pre-Built Invoice model

1 answer

Your answer