Azure Form Recognizer duplicating text extracted from PDF
While extracting values using Azure Form Recognizer, many values are shown duplicated.
I have trained a custom model labelling the appropriate key values. I find that the OCR duplicates the boxes, so that when I am labelling using the sample labeling tool I often get one box inside the other.I need to pick one and deselect the other, to avoid showing the value duplicated.
When I run the model to predict a new PDF for many keys I also get the values duplicated.
Furthermore, upon inspection of the Result JSON I can see that many Lines have the Bounded Boxes nested, or overlapping. That is, typically you would have a Line that has a bounded box and text associated that in turn have "Words" that have a bounded box inside the Bounded Box of the Line.
Just to clarify, in the JSON I am seeing Lines, that have overlapping or nested Bounded Boxes and therefore text.
Any clues as to why this can be?