How to format data for Named Entity Recognition (NER)

NER dataset shapes:

  • Key information file: The key information file contains a list of entities, which serves as key information for the training data.
  • Training data: Training data consists of a file (.txt, .tsv) containing columns separated by a Tab character. One of the columns is a sentence column, while the others represent labels for tokens within the sentence column.