Thanks for reaching out to us. As you mentioned, you want a trained model to extract content directly. I would recommend you to try Form Recognizer or Named Entity Recognition (NER).
For Named Entity Recognition, the difference between it and Custom Entity Extraction is, it uses the default model, you don't need to train it: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/named-entity-recognition/overview
For Form Recognizer, I recommend general document model, the General document preview model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract key-value pairs and entities from documents. General document is only available with the preview (v3.0) API.
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-general-document
If you need better performance, custom NER what you have tried should be a better choice, it enables its users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively tag data, train, evaluate, and improve model performance before making it available for consumption. The quality of the tagged data greatly impacts model performance.
https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/custom-named-entity-recognition/overview
What you need to do is basically tagging your data. Please try above to see which is the best choice for your business.
Hope this helps, please let us know if you need further assistance.
Please kindly accept the answer if you feel helpful, thank you !
Regards,
Yutong
@Sooraj Sudhakaran
Improve the model is the next step I would recommend you to do, please see this guidance and have a try to review the test set and examine the data distribution.
https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/custom-named-entity-recognition/how-to/improve-model
Hope this helps!
Regards,
Yutong
when we did training with 20 files, we got a success model . While we added more files and tagged , Training have no entity recognition, Is that any internal error? What could be the reason?
Do we need to tag all occurrence of entity in a file?