There's an old saying. F around and find out. Well by changing the labels and SOPS the previous data collected became not as accurate. Im a firm believer in the old saying . its a large part of learning and e
entreprenurial activites importance to detail. Id say your study was success. I learned from it and i hope many other people do to.
Custom NER - Explanation on +70% F1 to 0% F1 after updating the span of ONE entity.
Hi !
I'm building a custom NER model by labelling some documents via Language Services.
I first wanted to test the tool, I labelled 30 documents and submitted them to train a custom NER model.
The resulting F1 score was ~70% - this was a baseline.
I went and updated about 8 labelled examples in my training data (out of +100 examples for 6 types of entity). Most were just to update the span of a named entity. One was labelling a new one.
My scores crashed to about ~3%.
I reverted back to my original labelled dataset and recovered my original F1 score. I downloaded the model data (which was challenging to use since your offset/length is probably recorded post text processing and doesn't match the original document...) and started making incremental changes to try and explain the sudden drop in scores.
I have found that updating the span of ONE element made me lose ~2 points, which is fine. BUT, doing the same thing for another element made my model F1 score crash to 0%.
I have no idea what's happening, no visibility, no explanation (apart from seeing that all my predictions are suddenly None
).
I understand Cognitive Services offer a no-code approach for non data-scientists, which is really great, but I could use a bit more visibility on what's going on in a case like that - as all I know is that changing the span of ONE example seems to break everything - which is definitely strange.
Could I please get some help on this ?
Thank you !
My data, for this model:
- 6 entity types
- 25 training documents
- 5 test documents
1 answer
Sort by: Most helpful
-
John Glaros 1 Reputation point
2022-11-09T07:55:51.22+00:00