How to create custom NER project

Use this article to learn how to set up the requirements for starting with custom NER and create a project.

Prerequisites

Before you start using custom NER, you will need:

Create a Language resource

Before you start using custom NER, you will need an Azure Language resource. It is recommended to create your Language resource and connect a storage account to it in the Azure portal. Creating a resource in the Azure portal lets you create an Azure storage account at the same time, with all of the required permissions pre-configured. You can also read further in the article to learn how to use a pre-existing resource, and configure it to work with custom named entity recognition.

You also will need an Azure storage account where you will upload your .txt documents that will be used to train a model to extract entities.

Note

  • You need to have an owner role assigned on the resource group to create a Language resource.
  • If you will connect a pre-existing storage account, you should have an owner role assigned to it.

Create Language resource and connect storage account

You can create a resource in the following ways:

  • The Azure portal
  • Language Studio
  • PowerShell

Note

You shouldn't move the storage account to a different resource group or subscription once it's linked with the Language resource.

Create a new resource from the Azure portal

  1. Go to the Azure portal to create a new Azure Language resource.

  2. In the window that appears, select Custom text classification & custom named entity recognition from the custom features. Click Continue to create your resource at the bottom of the screen.

    A screenshot showing custom text classification & custom named entity recognition in the Azure portal.

  3. Create a Language resource with following details.

    Name Description
    Subscription Your Azure subscription.
    Resource group A resource group that will contain your resource. You can use an existing one, or create a new one.
    Region The region for your Language resource. For example, "West US 2".
    Name A name for your resource.
    Pricing tier The pricing tier for your Language resource. You can use the Free (F0) tier to try the service.

    Note

    If you get a message saying "your login account is not an owner of the selected storage account's resource group", your account needs to have an owner role assigned on the resource group before you can create a Language resource. Contact your Azure subscription owner for assistance.

  4. In the Custom text classification & custom named entity recognition section, select an existing storage account or select New storage account. These values are to help you get started, and not necessarily the storage account values you’ll want to use in production environments. To avoid latency during building your project connect to storage accounts in the same region as your Language resource.

    Storage account value Recommended value
    Storage account name Any name
    Storage account type Standard LRS
  5. Make sure the Responsible AI Notice is checked. Select Review + create at the bottom of the page, then select Create.

Create a new Language resource from Language Studio

If it's your first time logging in, you'll see a window in Language Studio that will let you choose an existing Language resource or create a new one. You can also create a resource by clicking the settings icon in the top-right corner, selecting Resources, then clicking Create a new resource.

Create a Language resource with following details.

Instance detail Required value
Azure subscription Your Azure subscription
Azure resource group Your Azure resource group
Azure resource name Your Azure resource name
Location The region of your Language resource.
Pricing tier The pricing tier of your Language resource.

Important

  • Make sure to to enable Managed Identity when you create a Language resource.
  • Read and confirm Responsible AI notice

To use custom named entity recognition, you'll need to create an Azure storage account if you don't have one already.

Create a new Language resource using PowerShell

You can create a new resource and a storage account using the following CLI template and parameters files, which are hosted on GitHub.

Edit the following values in the parameters file:

Parameter name Value description
name Name of your Language resource
location Region in which your resource is hosted. for more information, see Service limits.
sku Pricing tier of your resource.
storageResourceName Name of your storage account
storageLocation Region in which your storage account is hosted.
storageSkuType SKU of your storage account.
storageResourceGroupName Resource group of your storage account

Use the following PowerShell command to deploy the Azure Resource Manager (ARM) template with the files you edited.

New-AzResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
  -TemplateFile <path-to-arm-template> `
  -TemplateParameterFile <path-to-parameters-file>

See the ARM template documentation for information on deploying templates and parameter files.

Note

  • The process of connecting a storage account to your Language resource is irreversible, it cannot be disconnected later.
  • You can only connect your language resource to one storage account.

Using a pre-existing Language resource

You can use an existing Language resource to get started with custom NER as long as this resource meets the below requirements:

Requirement Description
Regions Make sure your existing resource is provisioned in one of the supported regions. If not, you will need to create a new resource in one of these regions.
Pricing tier Learn more about supported pricing tiers.
Managed identity Make sure that the resource's managed identity setting is enabled. Otherwise, read the next section.

To use custom named entity recognition, you'll need to create an Azure storage account if you don't have one already.

Enable identity management for your resource

Your Language resource must have identity management, to enable it using Azure portal:

  1. Go to your Language resource
  2. From left hand menu, under Resource Management section, select Identity
  3. From System assigned tab, make sure to set Status to On

Enable custom named entity recognition feature

Make sure to enable Custom text classification / Custom Named Entity Recognition feature from Azure portal.

  1. Go to your Language resource in Azure portal
  2. From the left side menu, under Resource Management section, select Features
  3. Enable Custom text classification / Custom Named Entity Recognition feature
  4. Connect your storage account
  5. Click Apply

Important

  • Make sure that your Language resource has storage blob data contributor role assigned on the storage account you are connecting.

Add required roles

Use the following steps to set the required roles for your Language resource and storage account.

An animated image showing how to set roles in the Azure portal.

Roles for your Azure Language resource

  1. Go to your storage account or Language resource in the Azure portal.

  2. Select Access Control (IAM) in the left navigation menu.

  3. Select Add to Add Role Assignments, and choose the appropriate role for your account.

    You should have the owner or contributor role assigned on your Language resource.

  4. Within Assign access to, select User, group, or service principal

  5. Select Select members

  6. Select your user name. You can search for user names in the Select field. Repeat this for all roles.

  7. Repeat these steps for all the user accounts that need access to this resource.

Roles for your storage account

  1. Go to your storage account page in the Azure portal.
  2. Select Access Control (IAM) in the left navigation menu.
  3. Select Add to Add Role Assignments, and choose the Storage blob data contributor role on the storage account.
  4. Within Assign access to, select Managed identity.
  5. Select Select members
  6. Select your subscription, and Language as the managed identity. You can search for user names in the Select field.

![IMPORTANT] If you have a virtual network or private endpoint, be sure to select Allow Azure services on the trusted services list to access this storage account in the Azure portal.

Enable CORS for your storage account

Make sure to allow (GET, PUT, DELETE) methods when enabling Cross-Origin Resource Sharing (CORS). Set allowed origins field to https://language.cognitive.azure.com. Allow all header by adding * to the allowed header values, and set the maximum age to 500.

A screenshot showing how to use CORS for storage accounts.

Create a custom named entity recognition project

Once your resource and storage container are configured, create a new custom NER project. A project is a work area for building your custom AI models based on your data. Your project can only be accessed by you and others who have access to the Azure resource being used. If you have labeled data, you can use it to get started by importing a project.

  1. Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select the Language resource you created in the above step.

  2. Under the Extract information section of Language Studio, select Custom named entity recognition.

    A screenshot showing the location of custom NER in the Language Studio landing page.

  3. Select Create new project from the top menu in your projects page. Creating a project will let you tag data, train, evaluate, improve, and deploy your models.

    A screenshot of the project creation page.

  4. After you click, Create new project, a window will appear to let you connect your storage account. If you've already connected a storage account, you will see the storage accounted connected. If not, choose your storage account from the dropdown that appears and click on Connect storage account; this will set the required roles for your storage account. This step will possibly return an error if you are not assigned as owner on the storage account.

    Note

    • You only need to do this step once for each new resource you use.
    • This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
    • You can only connect your Language resource to one storage account.

    A screenshot showing the storage connection screen.

  5. Enter the project information, including a name, description, and the language of the files in your project. If you're using the example dataset, select English. You won’t be able to change the name of your project later. Click Next

    Tip

    Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.

  6. Select the container where you have uploaded your dataset. If you have already labeled data make sure it follows the supported format and click on Yes, my files are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu. Click Next.

  7. Review the data you entered and select Create Project.

Import project

If you have already labeled data, you can use it to get started with the service. Make sure that your labeled data follows the accepted data formats.

  1. Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select your Language resource.

  2. Under the Extract information section of Language Studio, select Custom named entity recognition.

    A screenshot showing the location of the custom NER feature in the Language Studio landing page.

  3. Select Create new project from the top menu in your projects page. Creating a project will let you tag data, train, evaluate, improve, and deploy your models.

    A screenshot of the project creation page.

  4. After you select Create new project, a screen will appear to let you connect your storage account. If you can’t find your storage account, make sure you created a resource using the recommended steps. If you've already connected a storage account to your Language resource, you will see your storage account connected.

    Note

    • You only need to do this step once for each new language resource you use.
    • This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
    • You can only connect your Language resource to one storage account.

    A screenshot of the storage connection screen for new projects.

  5. Enter the project information, including a name, description, and the language of the files in your project. You won’t be able to change the name of your project later. Click Next.

    Tip

    Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.

  6. Select the container where you have uploaded your dataset.

  7. Click on Yes, my files are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu below to import your JSON labels file. Make sure it follows the supported format.

  8. Click Next.

  9. Review the data you entered and select Create Project.

Get project details

  1. Go to your project settings page in Language Studio.

  2. You can see project details.

  3. In this page you can update project description and enable/disable Multi-lingual dataset in project settings.

  4. You can also view the connected storage account and container to your Language resource.

  5. You can also retrieve your primary resource key from this page.

    A screenshot of the project settings page in Language Studio.

Delete project

When you don't need your project anymore, you can delete your project using Language Studio. Select Custom named entity recognition (NER) from the top, select project you want to delete and click on Delete from the top menu.

Next steps

  • You should have an idea of the project schema you will use to label your data.

  • After your project is created, you can start labeling your data, which will inform your entity extraction model how to interpret text, and is used for training and evaluation.