Frequently asked questions for Custom Named Entity Recognition

2025-06-30

Find answers to commonly asked questions about concepts, and scenarios related to custom NER in Azure AI Language.

How do I get started with the service?

See the quickstart to quickly create your first project, or view how to create projects for more detailed information.

What are the service limits?

See the service limits article for more information.

How many tagged files are needed?

Generally, diverse and representative tagged data leads to better results, given that the tagging is done precisely, consistently and completely. There is no set number of tagged instances that will make every model perform well. Performance highly dependent on your schema, and the ambiguity of your schema. Ambiguous entity types need more tags. Performance also depends on the quality of your tagging. The recommended number of tagged instances per entity is 50.

Training is taking a long time, is this expected?

The training process can take a long time. As a rough estimate, the expected training time for files with a combined length of 12,800,000 chars is 6 hours.

How do I build my custom model programmatically?

Note

Currently you can only build a model using the REST API or Language Studio.

You can use the REST APIs to build your custom models. Follow this quickstart to get started with creating a project and creating a model through APIs for examples of how to call the Authoring API.

When you're ready to start using your model to make predictions, you can use the REST API, or the client library.

What is the recommended CI/CD process?

You can train multiple models on the same dataset within the same project. After you have trained your model successfully, you can view its performance. You can deploy and test your model within Language studio. You can add or remove labels from your data and train a new model and test it as well. View service limits to learn about maximum number of trained models with the same project. When you train a model, you can determine how your dataset is split into training and testing sets. You can also have your data split randomly into training and testing set where there is no guarantee that the reflected model evaluation is about the same test set, and the results are not comparable. It's recommended that you develop your own test set and use it to evaluate both models so you can measure improvement.

Does a low or high model score guarantee bad or good performance in production?

Model evaluation may not always be comprehensive. This depends on:

If the test set is too small so the good/bad scores are not representative of model's actual performance. Also if a specific entity type is missing or under-represented in your test set it will affect model performance.
Data diversity if your data only covers few scenarios/examples of the text you expect in production, your model will not be exposed to all possible scenarios and might perform poorly on the scenarios it hasn't been trained on.
Data representation if the dataset used to train the model is not representative of the data that would be introduced to the model in production, model performance will be affected greatly.

See the data selection and schema design article for more information.

How do I improve model performance?

View the model confusion matrix. If you notice that a certain entity type is frequently not predicted correctly, consider adding more tagged instances for this class. If you notice that two entity types are frequently predicted as each other, this means the schema is ambiguous, and you should consider merging them both into one entity type for better performance.
Review test set predictions. If one of the entity types has a lot more tagged instances than the others, your model may be biased towards this type. Add more data to the other entity types or remove examples from the dominating type.
Learn more about data selection and schema design.
Review your test set to see predicted and tagged entities side-by-side so you can get a better idea of your model performance, and decide if any changes in the schema or the tags are necessary.

Why do I get different results when I retrain my model?

When you train your model, you can determine if you want your data to be split randomly into train and test sets. If you do, so there is no guarantee that the reflected model evaluation is on the same test set, so results are not comparable.
If you're retraining the same model, your test set will be the same, but you might notice a slight change in predictions made by the model. This is because the trained model is not robust enough and this is a factor of how representative and distinct your data is and the quality of your tagged data.

How do I get predictions in different languages?

First, you need to enable the multilingual option when creating your project or you can enable it later from the project settings page. After you train and deploy your model, you can start querying it in multiple languages. You may get varied results for different languages. To improve the accuracy of any language, add more tagged instances to your project in that language to introduce the trained model to more syntax of that language.

I trained my model, but I can't test it

You need to deploy your model before you can test it.

How do I use my trained model for predictions?

After deploying your model, you call the prediction API, using either the REST API or client libraries.

Data privacy and security

Custom NER is a data processor for General Data Protection Regulation (GDPR) purposes. In compliance with GDPR policies, Custom NER users have full control to view, export, or delete any user content either through the Language Studio or programmatically by using REST APIs.

Your data is only stored in your Azure Storage account. Custom NER only has access to read from it during training.

How to clone my project?

To clone your project you need to use the export API to export the project assets, and then import them into a new project. See the REST API reference for both operations.