Hi Maddu, Murali P. (TR Technology)
The Azure Document Intelligence prebuilt invoice model processes invoice and extracts key information, but it is designed to handle one invoice per page.
Here are some steps:
- Please split the pdfs to single pages using a python library and train the model.
- Please use custom neural model for training.
import PyPDF2
def split_pdf(input_pdf_path, output_folder):
# Open the input PDF file
with open(input_pdf_path, 'rb') as input_pdf_file:
# Read the PDF file
reader = PyPDF2.PdfFileReader(input_pdf_file)
# Loop through all the pages in the PDF
for page_number in range(reader.getNumPages()):
# Create a new PDF writer
writer = PyPDF2.PdfFileWriter()
# Add the current page to the writer
writer.addPage(reader.getPage(page_number))
# Create the output PDF file path
output_pdf_path = f'{output_folder}/page_{page_number + 1}.pdf'
# Write the single-page PDF to the output file
with open(output_pdf_path, 'wb') as output_pdf_file:
writer.write(output_pdf_file)
print(f'Saved: {output_pdf_path}')
# Usage example
input_pdf_path = 'C:/Users/v-manmohanty/Downloads/sample-pdf-files-sample3.pdf'
output_folder = 'C:/Users/v-manmohanty/Documents/Doctest'
split_pdf(input_pdf_path, output_folder)
Thank You.