How to use dozens of Sharepoint Syntex document understanding models to classify and extract data from documents in the same library?

HastaManana 1 Reputation point
2022-03-10T07:07:32.46+00:00

Greetings,

Thanks in advance for the community's help:

My objective: My business receives hundreds of documents per week via mail, email, fax, and online portals, e.g., form letters from various government agencies, forms, questionnaires, legal filings, invoices, medical records, etc.

My staff spends hours each day: scanning the documents to *.pdf , figuring out what type of document has been received (there are over 130 types with unique formatting and phrases predictable to each type), and then visually reviewing the document to enter data therefrom into the correct client file and fields in our document management desktop program.

I want to scan the documents to a folder synced to my Sharepoint library and have Sharepoint Syntex automatically classify the documents and extract the pertinent data therefrom. I also plan to use Power Automate Cloud + Desktop to, among other things, change the *.pdf file name to match the name of the applied document understanding model (which is important to my work).

My problem: I successfully created dozens of document understanding models that (with a few exceptions) have training scores of 100% accuracy. I applied my models to my Sharepoint library and uploaded several dozens of different types of documents to test. In other words, I applied dozens of document understanding models to the same library (because I just want to scan the documents to that library folder daily and let Sharepoint do its thing).

I was disappointed when I realized that SharePoint Syntex was only using two or three of the models to try to extract data. This is not withstanding that the models used are not of the right type and there are otherwise models directly on point.

I suspect the issue has to do with this caveat I found in the Microsoft documentation, "If two or more document understanding models are applied to the same library, the uploaded file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only."

Seeking: What/where is the error in my scheme? Is there any fix?

Thank you.

SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
10,300 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. CaseyYang-MSFT 10,341 Reputation points
    2022-03-10T10:27:14.347+00:00

    Hi @HastaManana ,

    Like you said: "If two or more document understanding models are applied to the same library, the uploaded file is classified using the model that has the highest average confidence score. The extracted entities will be from the applied model only." This behavior is by design. You could send feedback in SharePoint Feedback.


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments