Hello Ashish Vemula
Thanks for reaching out to us. I understand that you are using the Form Recognizer pre-built layout document model to extract tables from PDF documents, but the service is not detecting tables and space-separated columns correctly.
To improve the accuracy of the Form Recognizer service in detecting tables and columns, you can try the following steps:
- Use the "prebuilt-receipt" model instead of the "prebuilt-layout" model. The "prebuilt-receipt" model is specifically designed to extract tables and key-value pairs from receipts, and it may provide better results for your use case from my personal experience.
- If the "prebuilt-receipt" model does not work well for your documents, you can try training a custom model using your own dataset. This will allow you to fine-tune the model to your specific needs and improve its accuracy. I know you have concern since you mentioned documents don't follow a particular format, but I would say custom model will help to understand the table more.
- If you are still having issues with the service not detecting tables and columns correctly, please share the document to us if that is not confidential so that we can reproduce and investigate on it.
I hope these help! Please let me know if you have any other questions or concerns, I am looking forward to hearing from you.
Regards,
Yutong
-Please kindly accept the answer and vote "Yes" if you feel helpful to support the community, thanks a lot.