Azure Document Analysis - Pre-Built Invoice model

Maddu, Murali P. (TR Technology) 0 Reputation points
2025-03-11T09:32:27.9633333+00:00

Hi,

We are thinking to use Azure Document Intelligence pre built invoice model and one of our use case is we can have a single document with multiple different invoices pages with different invoice id's but the analysis output fields is extracting only last page invoice id but not the previous page invoice id's, what I'm looking for is treating each page as separate document share the results for multi page invoice document, may I know if that is possible?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator
    2025-03-11T15:22:48.1466667+00:00

    Hi Maddu, Murali P. (TR Technology)

    The Azure Document Intelligence prebuilt invoice model processes invoice and extracts key information, but it is designed to handle one invoice per page.

    Here are some steps:

    1. Please split the pdfs to single pages using a python library and train the model. 
    2. Please use custom neural model for training.
    import PyPDF2
     
    def split_pdf(input_pdf_path, output_folder):
        # Open the input PDF file
        with open(input_pdf_path, 'rb') as input_pdf_file:
            # Read the PDF file
            reader = PyPDF2.PdfFileReader(input_pdf_file)
     
            # Loop through all the pages in the PDF
            for page_number in range(reader.getNumPages()):
                # Create a new PDF writer
                writer = PyPDF2.PdfFileWriter()
     
                # Add the current page to the writer
                writer.addPage(reader.getPage(page_number))
     
                # Create the output PDF file path
                output_pdf_path = f'{output_folder}/page_{page_number + 1}.pdf'
     
                # Write the single-page PDF to the output file
                with open(output_pdf_path, 'wb') as output_pdf_file:
                    writer.write(output_pdf_file)
     
                print(f'Saved: {output_pdf_path}')
     
    # Usage example
    input_pdf_path = 'C:/Users/v-manmohanty/Downloads/sample-pdf-files-sample3.pdf'
    output_folder = 'C:/Users/v-manmohanty/Documents/Doctest'
    split_pdf(input_pdf_path, output_folder)
    
    

    Thank You.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.