Tips for labeling a handwritten scorecard Custom Extraction Model

Question

Tips for labeling a handwritten scorecard Custom Extraction Model

Ercel Concepcion 0

Hi! I am training a model to extract scores of a bunch of handwritten golf scorecards. I notice after running the layout, the bottom 2-3 rows doesn't get recognized as individual cells, but the first 2 rows are fine. This is a common occurence in the 30+ scorecards I labeled. User's image

Below is the result after running the analysis using the model I created. The 4th and 5th row was not recognized although it looks the same as the first 2 rows. User's image

Do you have any suggestions on how to improve this result? Should I use more training data? Is it better to use region when labeling a table? Is it better to use other formats like PDF or JPG? I am currently using PNG file. The files are photos of the cards and and not scanned, although it was good image quality (2mb each file) does it have an effect on the outcome? Thank you in advance!

Ramr-msft 17,826 Reputation points

2024-02-16T11:59:11.2566667+00:00

Ercel Concepcion Thanks for the question, The bottom rows are not clear, adding the good image quality with bottom rows will help to recognize. You can use the Layout in Document intelligent studio to recognize the tables.
Ercel Concepcion 0 Reputation points

2024-02-19T01:06:34.06+00:00

Hi @Ramr-msft Thank you for your response. I used the Layout and it seems to show better result. However, in some rows, the cells merge into one. Is there any way to prevent it?
Ramr-msft 17,826 Reputation points

2024-03-04T04:37:22.5233333+00:00

Thnaks for the details, You can train the high quality images In custom extraction model that is to Extract distinct data from forms and documents specific to your business and use cases. You can reach out to the Document Intelligence team: formrecog_contact@microsoft.com. If possible, share a sample document with the issue highlighted.
Ercel Concepcion 0 Reputation points

2024-03-09T10:46:35.89+00:00

Thanks @Ramr-msft I will reach out to them. One more question, I notice that some text are missing, is it possible to know if a letter or word was not recognized instead of just skipping it?

Your answer

Ramr-msft 17,826 Reputation points

2024-02-16T11:59:11.2566667+00:00

Ercel Concepcion Thanks for the question, The bottom rows are not clear, adding the good image quality with bottom rows will help to recognize. You can use the Layout in Document intelligent studio to recognize the tables.
Ercel Concepcion 0 Reputation points

2024-02-19T01:06:34.06+00:00

Hi @Ramr-msft Thank you for your response. I used the Layout and it seems to show better result. However, in some rows, the cells merge into one. Is there any way to prevent it?
Ramr-msft 17,826 Reputation points

2024-03-04T04:37:22.5233333+00:00

Thnaks for the details, You can train the high quality images In custom extraction model that is to Extract distinct data from forms and documents specific to your business and use cases. You can reach out to the Document Intelligence team: formrecog_contact@microsoft.com. If possible, share a sample document with the issue highlighted.
Ercel Concepcion 0 Reputation points

2024-03-09T10:46:35.89+00:00

Thanks @Ramr-msft I will reach out to them. One more question, I notice that some text are missing, is it possible to know if a letter or word was not recognized instead of just skipping it?

Share via

Tips for labeling a handwritten scorecard Custom Extraction Model

Your answer