Custom NER - Explanation on +70% F1 to 0% F1 after updating the span of ONE entity.

Jessica Tanon 6 Reputation points
2022-11-07T08:55:36.027+00:00

Hi !

I'm building a custom NER model by labelling some documents via Language Services.

I first wanted to test the tool, I labelled 30 documents and submitted them to train a custom NER model.

The resulting F1 score was ~70% - this was a baseline.

I went and updated about 8 labelled examples in my training data (out of +100 examples for 6 types of entity). Most were just to update the span of a named entity. One was labelling a new one.

My scores crashed to about ~3%.

I reverted back to my original labelled dataset and recovered my original F1 score. I downloaded the model data (which was challenging to use since your offset/length is probably recorded post text processing and doesn't match the original document...) and started making incremental changes to try and explain the sudden drop in scores.

I have found that updating the span of ONE element made me lose ~2 points, which is fine. BUT, doing the same thing for another element made my model F1 score crash to 0%.

I have no idea what's happening, no visibility, no explanation (apart from seeing that all my predictions are suddenly None).

I understand Cognitive Services offer a no-code approach for non data-scientists, which is really great, but I could use a bit more visibility on what's going on in a case like that - as all I know is that changing the span of ONE example seems to break everything - which is definitely strange.

Could I please get some help on this ?

Thank you !

My data, for this model:

  • 6 entity types
  • 25 training documents
  • 5 test documents
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
359 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,415 questions
{count} vote

1 answer

Sort by: Most helpful
  1. John Glaros 1 Reputation point
    2022-11-09T07:55:51.22+00:00

    There's an old saying. F around and find out. Well by changing the labels and SOPS the previous data collected became not as accurate. Im a firm believer in the old saying . its a large part of learning and e
    entreprenurial activites importance to detail. Id say your study was success. I learned from it and i hope many other people do to.