Custom Form Recognizer Form model issue with empty fields

Rui Lopes 21 Reputation points
2023-01-30T16:10:50.0533333+00:00

Hi,

We are building a template model for a custom form and are encountering an issue with empty fields. Whenever there is an empty field the model tends to retrieve text from other parts of the form, sometimes just before or after the field, other times completely unrelated and far from the field.

We don't have variations of the form, the layout is always the same. We do have however some forms where one or two of the fields are empty. We increased the number of training samples to cerca 50 files, containing some samples where the fields are empty. We also tried training also with the samples where the field is empty annotated with a region but did not make a difference.

Can someone shed a light on what is going on here?

Thanks,

RL

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,358 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,364 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Rui Lopes 21 Reputation points
    2023-01-31T16:30:09.83+00:00

    Hi @romungi-MSFT

    Thanks for the prompt reply. We have tried your suggestion, the problem is that we end up with a lot of misclassifications for the forms with the filled field (classified as document with unfilled field) and this is worse than having a few unfilled fields with garbage. I guess this is to be expected since the "two" types of document are almost identical. And we actually have more samples with the filled field than otherwise...

    Anyhow in a template model I can understand how the absence of content in a field may lead to extraction of contiguous text, but not to extraction of text from unrelated locations in the document. We will submit feedback in the Studio as suggested but we would appreciate if this could be scaled up to the responsible team.

    Cheers,

    RL