Complex Document to Parse - Looking for Ideas

Salik Rafiq 1 Reputation point
2021-07-16T10:31:12.03+00:00

I am been tasked with parsing data from Filing information documents which has a very odd layout.

I attempted to create my own layout and model using the editor but didn't have success.

115330-10045407-sh01-2021-07-15.pdf

If you look at the attachment this is a sample of what I would like to parse. I thought I'd try Forms Recognizer but it could not handle the repetitive part as a table. The training confidence was very very low at around 35%. I did try some sample but nothing was extracted, as expected.

Does anyone have any suggestions? Perhaps Forms Recognizer is the tool to use here?

Any help appreciated.

Azure Document Intelligence
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 14,596 Reputation points
    2021-07-19T03:23:25.167+00:00

    @Salik Rafiq Thanks for the question. Can you please add more details that has been extracted from the custom model form recognizer.
    As a workaround until then you can try and use the Form Recognizer train with labels feature and label these tables as key value pairs, labeling each cell of the table as a value. Please note you will need to label and train with 5 samples with the maximum number of rows in the tables. Let me know if this helps.
    Please follow the document to Train a custom model using the sample labeling tool.

    0 comments No comments