Text Analytics PII Extraction - Using custom entities

Bradyn 0 Reputation points
2023-01-20T15:54:42.3433333+00:00

Hi,

I'm currently trying to use Azure Language TextAnalytics to remove PII information from a dataset. However, there are custom PII entities that need to be extracted (such as strings of characters+numerics) that are not being recognized by the model. Is there any way to utilize the PII extraction to include custom entities or is this outside of the scope of the service?

Thanks!

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
523 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,632 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Andrew Dello Stritto 85 Reputation points
    2023-01-22T00:56:56.3166667+00:00

    If you have custom PII entities that are not being recognized by the pre-built model, such as strings of characters+numerics, you have a couple of options to include them:

    1. Use the Text Analytics API's entity recognition feature and provide your own set of regular expressions that match the custom PII entities you want to extract. This will allow you to customize the PII extraction process and add support for your custom entities.
    2. You can use a custom model that is trained on your own data with the custom entities. Azure Cognitive Services allows you to create and train custom models using the Text Analytics Custom Model API. With this approach, you'll need a labeled dataset with your custom PII entities, and use it to train the model, then you can use it to extract the custom entities.
    1 person found this answer helpful.
    0 comments No comments

  2. Andrew Dello Stritto 85 Reputation points
    2023-01-22T00:56:13.4766667+00:00

    If you have custom PII entities that are not being recognized by the pre-built model, such as strings of characters+numerics, you have a couple of options to include them:

    1. Use the Text Analytics API's entity recognition feature and provide your own set of regular expressions that match the custom PII entities you want to extract. This will allow you to customize the PII extraction process and add support for your custom entities.
    2. You can use a custom model that is trained on your own data with the custom entities. Azure Cognitive Services allows you to create and train custom models using the Text Analytics Custom Model API. With this approach, you'll need a labeled dataset with your custom PII entities, and use it to train the model, then you can use it to extract the custom entities.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.