Cognitive Service for Custom NER: Issues with reusing tags file generated by language studio

Martin Danner 1 Reputation point
2022-04-08T15:11:26.937+00:00

Hello,

I recently created some training data to train a NLP model for named entity recognition using Azure Cognitive Service for custom entity recognition.
Tagging the data and training the model worked pretty fine with language studio.
Nevertheless, I wanted to reuse the tagged training data to train a second model to serve in production. So I created a new project in language studio but when I want to select the tags file I receive the following error message:

Named entity recognition projects must contain non-empty list of entities

I checked the file multiple times and there are no empty list at all, all labeled entities are there. Since the tags file was generated by language studio itself and was already used for training with the same underlying data I am a little confused and reaching out for some ideas.

Here is some sample snippet (first few lines) of the tags file in JSON format:

{
  "intentNames": [],
  "entityNames": [
    "Enitity1",
    "Enitity2",
    "Enitity3",
    "Enitity4",
    "Enitity5"
  ],
  "entityHierarchySeparator": null,
  "documents": [
    {
      "text": null,
      "location": "file1.txt",
      "culture": "de",
      "intents": null,
      "entities": [
        {
          "regionStart": 0,
          "regionLength": 455,
          "labels": [
            { "entity": 1, "start": 0, "length": 14, "autoTagged": false },
            { "entity": 1, "start": 29, "length": 8, "autoTagged": false },
            { "entity": 1, "start": 57, "length": 8, "autoTagged": false },
            { "entity": 1, "start": 66, "length": 15, "autoTagged": false },
            { "entity": 1, "start": 82, "length": 19, "autoTagged": false },
            { "entity": 1, "start": 409, "length": 5, "autoTagged": false },
            { "entity": 1, "start": 419, "length": 5, "autoTagged": false },
            { "entity": 3, "start": 433, "length": 22, "autoTagged": false }
          ]
        }
      ],

Many thanks in advance for the help

Martin

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
352 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,354 questions
{count} votes

1 answer

Sort by: Most helpful
  1. GiftA-MSFT 11,151 Reputation points
    2022-04-16T06:04:04.403+00:00

    Hi, following up on this. The tags file that's saved in your storage container is the Service's format and is not expected to be used by users. Here's the expected tagged file format along with example dataset that you can experiment with. Extract the files and upload to blob storage. Then try creating a new project connecting to the container where you uploaded the files. Let me know if you have any further questions. Thanks.

    0 comments No comments