Form recognizer failed to detect space separated columns as two separate columns.

Ashish Vemula 0 Reputation points
2023-02-13T21:54:18.3666667+00:00

Subject: Form Recognizer service failing to detect tables and space-separated columns - with attached error screenshots and table image

Hello Azure team,

I am using the Form Recognizer pre-built layout document model to extract tables from PDF documents. In my use case, I have to deal with various formats of dynamic pdf tables. Since documents don't follow a particular format I am using the Layout API instead of the custom one. However, I am facing some issues with the service failing to detect tables and space-separated columns as two separate columns.

For example, when a table has two columns separated by a space, the service is not recognizing it as two separate columns, but rather as one column with a space in between. Additionally, some tables in my documents are not being detected by the service at all.

I have attached error screenshots and a sample table image for your reference. Please take a look and let me know if there is a way to handle such issues and provide guidance on how to resolve this problem.

Is there a way to improve the accuracy of the Form Recognizer service in detecting tables and columns in such cases? and can you please provide some guidance on how to resolve this problem? I would greatly appreciate any help and suggestions you can provide.

Thank you for your time and support.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,984 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,936 Reputation points
    2023-02-14T00:50:51.1266667+00:00

    Hello Ashish Vemula

    Thanks for reaching out to us. I understand that you are using the Form Recognizer pre-built layout document model to extract tables from PDF documents, but the service is not detecting tables and space-separated columns correctly.

    To improve the accuracy of the Form Recognizer service in detecting tables and columns, you can try the following steps:

    1. Use the "prebuilt-receipt" model instead of the "prebuilt-layout" model. The "prebuilt-receipt" model is specifically designed to extract tables and key-value pairs from receipts, and it may provide better results for your use case from my personal experience.
    2. If the "prebuilt-receipt" model does not work well for your documents, you can try training a custom model using your own dataset. This will allow you to fine-tune the model to your specific needs and improve its accuracy. I know you have concern since you mentioned documents don't follow a particular format, but I would say custom model will help to understand the table more.
    3. If you are still having issues with the service not detecting tables and columns correctly, please share the document to us if that is not confidential so that we can reproduce and investigate on it.

    I hope these help! Please let me know if you have any other questions or concerns, I am looking forward to hearing from you.

    Regards,

    Yutong

    -Please kindly accept the answer and vote "Yes" if you feel helpful to support the community, thanks a lot.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.