Редагувати

Поділитися через


What is custom named entity recognition?

Custom NER is one of the custom features offered by Azure AI Language. It is a cloud-based API service that applies machine-learning intelligence to enable you to build custom models for custom named entity recognition tasks.

Custom NER enables users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively label data, train, evaluate, and improve model performance before making it available for consumption. The quality of the labeled data greatly impacts model performance. To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the Language studio. You can easily get started with the service by following the steps in this quickstart.

This documentation contains the following article types:

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • Concepts provide explanations of the service functionality and features.
  • How-to guides contain instructions for using the service in more specific or customized ways.

Example usage scenarios

Custom named entity recognition can be used in multiple scenarios across a variety of industries:

Information extraction

Many financial and legal organizations extract and normalize data from thousands of complex, unstructured text sources on a daily basis. Such sources include bank statements, legal agreements, or bank forms. For example, mortgage application data extraction done manually by human reviewers may take several days to extract. Automating these steps by building a custom NER model simplifies the process and saves cost, time, and effort.

Search is foundational to any app that surfaces text content to users. Common scenarios include catalog or document search, retail product search, or knowledge mining for data science. Many enterprises across various industries want to build a rich search experience over private, heterogeneous content, which includes both structured and unstructured documents. As a part of their pipeline, developers can use custom NER for extracting entities from the text that are relevant to their industry. These entities can be used to enrich the indexing of the file for a more customized search experience.

Audit and compliance

Instead of manually reviewing significantly long text files to audit and apply policies, IT departments in financial or legal enterprises can use custom NER to build automated solutions. These solutions can be helpful to enforce compliance policies, and set up necessary business rules based on knowledge mining pipelines that process structured and unstructured content.

Project development lifecycle

Using custom NER typically involves several different steps.

The development lifecycle

  1. Define your schema: Know your data and identify the entities you want extracted. Avoid ambiguity.

  2. Label your data: Labeling data is a key factor in determining model performance. Label precisely, consistently and completely.

    1. Label precisely: Label each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your labels.
    2. Label consistently: The same entity should have the same label across all the files.
    3. Label completely: Label all the instances of the entity in all your files.
  3. Train the model: Your model starts learning from your labeled data.

  4. View the model's performance: After training is completed, view the model's evaluation details, its performance and guidance on how to improve it.

  5. Deploy the model: Deploying a model makes it available for use via the Analyze API.

  6. Extract entities: Use your custom models for entity extraction tasks.

Reference documentation and code samples

As you use custom NER, see the following reference documentation and samples for Azure AI Language:

Development option / language Reference documentation Samples
REST APIs (Authoring) REST API documentation
REST APIs (Runtime) REST API documentation
C# (Runtime) C# documentation C# samples
Java (Runtime) Java documentation Java Samples
JavaScript (Runtime) JavaScript documentation JavaScript samples
Python (Runtime) Python documentation Python samples

Responsible AI

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Read the transparency note for custom NER to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:

Next steps

  • Use the quickstart article to start using custom named entity recognition.

  • As you go through the project development lifecycle, review the glossary to learn more about the terms used throughout the documentation for this feature.

  • Remember to view the service limits for information such as regional availability.