Read OCR Handwritten text extraction

AzureUser-9588 151 Reputation points
2024-02-05T05:49:28.24+00:00

Is it possible to increase the accuracy of identification and extraction of Handwritten text using Azure AI Document Intelligence? The default Read OCR handwritten text extraction for the set of documents that I am using are not satisfactory.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,540 Reputation points Microsoft Employee Moderator
    2024-02-05T06:54:48.7033333+00:00

    @AzureUser-9588 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Azure AI Document Intelligence Read Optical Character Recognition (OCR) model runs at a higher resolution than Azure AI Vision Read and extracts print and handwritten text from PDF documents and scanned images.

    The Read OCR model / document layout model extracts print and handwritten style text as lines and words. This feature applies to supported handwritten languages. Please check if your concerned language is under the supported list.

    To extract printed and handwritten text along with barcodes, formulas and font styles from images and documents:

    Read model DI studio link: https://documentintelligence.ai.azure.com/studio/read

    Layout model DI studio link: https://documentintelligence.ai.azure.com/studio/layout

    If you have already tried the above and feel that the identification and accuracy needs to be improved then follow the below:

    Action Plan: It is possible to improve the accuracy of handwritten text extraction using Azure AI Document Intelligence. Here are some strategies you can consider:

    • Custom Models: Custom models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. You can use these scores to interpret the accuracy and improve the results.
    • Confidence Scores: Document Intelligence analysis results return an estimated confidence for predicted words, key-value pairs, selection marks, regions, and signatures. You can use these confidence scores to determine whether to automatically accept the prediction or flag it for human review.
    • Training Data: Ensure that all variations of a document are included in the training dataset. This can help produce a model with higher accuracy and confidence scores during analysis and reduce the number of documents flagged for human review.

    More Info about this is here. Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help. ** Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.