Azure Document Intelligence custom classification model behaves differently with extra pages

Sarah Cummings 45 Reputation points
2023-10-12T19:15:08.22+00:00

I've trained a custom classification model in Azure Document Intelligence, and the model recognizes if pages in a pdf are one of three form types. I've found that if a user submits files with additional pages at the end of the PDF, it changes the results for the predictions of the form pages.

If the user submits only the form page, we positively identify that page as our form type with 80% confidence. If the user submits a 23 page document with one page being one of the three forms, and the other pages all being junk, we no longer confidently predicting the form page.

Do I need to train my model differently to possibly identify this junk? Or should I create an intermediary step where I split documents and post each page to the classification model?

I'm worried that calling the model one page at a time would take much longer. I'm also wondering if cost is computed differently for 23 1-page requests vs. 1 23-page request.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
{count} vote

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 18,676 Reputation points Moderator
    2023-10-12T22:11:44.2366667+00:00

    Hello @Sarah Cummings , Thanks for using Microsoft Q&A Platform.

    As we know the Custom classification models in Azure Document Intelligence are designed to process each page of the input file separately and makes a prediction for each page based on its content and layout.

    This is a known behavior, the model classifies each page of the input document to one of the classes in the labeled dataset, and additional pages may introduce noise or irrelevant information that can affect the model's predictions.

    In this scenario, the possible workaround could be retraining the model with additional data that includes the extra pages to improve the model's accuracy. Or you can split the documents and posting required page to the classification model.

    Also please note that training custom models is always free with Document Intelligence. You are only charged when a model is used to analyze a document means it is billed by number of pages that are analyzed. Please visit here for pricing information: https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/

    Here is the service limit details for custom models usage: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/service-limits?view=doc-intel-3.1.0#custom-model-usage

    I hope this helps.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.