What is Personally Identifiable Information (PII) detection in Azure Cognitive Service for Language?

PII detection is one of the features offered by Azure Cognitive Service for Language, a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. For example: phone numbers, email addresses, and forms of identification. The method for utilizing PII in conversations is different than other use cases, and articles for this use have been separated.

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • How-to guides contain instructions for using the service in more specific or customized ways.
  • The conceptual articles provide in-depth explanations of the service's functionality and features.

PII comes into two shapes:

Typical workflow

To use this feature, you submit data for analysis and handle the API output in your application. Analysis is performed as-is, with no additional customization to the model used on your data.

  1. Create an Azure Language resource, which grants you access to the features offered by Azure Cognitive Service for Language. It will generate a password (called a key) and an endpoint URL that you'll use to authenticate API requests.

  2. Create a request using either the REST API or the client library for C#, Java, JavaScript, and Python. You can also send asynchronous calls with a batch request to combine API requests for multiple features into a single call.

  3. Send the request containing your data as raw unstructured text. Your key and endpoint will be used for authentication.

  4. Stream or store the response locally.

Get started with PII detection

To use PII detection, you submit raw unstructured text for analysis and handle the API output in your application. Analysis is performed as-is, with no customization to the model used on your data. There are two ways to use PII detection:

Development option Description
Language studio Language Studio is a web-based platform that lets you try entity linking with text examples without an Azure account, and your own data when you sign up. For more information, see the Language Studio website or language studio quickstart.
REST API or Client library (Azure SDK) Integrate PII detection into your applications using the REST API, or the client library available in various languages. For more information, see the PII detection quickstart.

Reference documentation and code samples

As you use this feature in your applications, see the following reference documentation and samples for Azure Cognitive Services for Language:

Development option / language Reference documentation Samples
REST API REST API documentation
C# C# documentation C# samples
Java Java documentation Java Samples
JavaScript JavaScript documentation JavaScript samples
Python Python documentation Python samples

Responsible AI

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it's deployed. Read the transparency note for PII to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:

Example scenarios

  • Apply sensitivity labels - For example, based on the results from the PII service, a public sensitivity label might be applied to documents where no PII entities are detected. For documents where US addresses and phone numbers are recognized, a confidential label might be applied. A highly confidential label might be used for documents where bank routing numbers are recognized.
  • Redact some categories of personal information from documents that get wider circulation - For example, if customer contact records are accessible to first line support representatives, the company may want to redact the customer's personal information besides their name from the version of the customer history to preserve the customer's privacy.
  • Redact personal information in order to reduce unconscious bias - For example, during a company's resume review process, they may want to block name, address and phone number to help reduce unconscious gender or other biases.
  • Replace personal information in source data for machine learning to reduce unfairness – For example, if you want to remove names that might reveal gender when training a machine learning model, you could use the service to identify them and you could replace them with generic placeholders for model training.
  • Remove personal information from call center transcription – For example, if you want to remove names or other PII data that happen between the agent and the customer in a call center scenario. You could use the service to identify and remove them.
  • Data cleaning for data science - PII can be used to make the data ready for data scientists and engineers to be able to use these data to train their machine learning models. Redacting the data to make sure that customer data isn't exposed.

Next steps

There are two ways to get started using the entity linking feature:

  • Language Studio, which is a web-based platform that enables you to try several Language service features without needing to write code.
  • The quickstart article for instructions on making requests to the service using the REST API and client library SDK.