Events
Take the Microsoft Learn Challenge
Nov 19, 11 PM - Jan 10, 11 PM
Ignite Edition - Build skills in Microsoft Azure and earn a digital badge by January 10!
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
The PII feature can evaluate unstructured text, extract, and redact sensitive information (PII) and health information (PHI) in text across several predefined categories.
To use PII detection, you submit text for analysis and handle the API output in your application. Analysis is performed as-is, with no customization to the model used on your data. There are two ways to use PII detection:
Development option | Description |
---|---|
Language studio | Language Studio is a web-based platform that lets you try entity linking with text examples without an Azure account, and your own data when you sign up. For more information, see the Language Studio website or language studio quickstart. |
REST API or Client library (Azure SDK) | Integrate PII detection into your applications using the REST API, or the client library available in various languages. For more information, see the PII detection quickstart. |
By default, this feature uses the latest available AI model on your text. You can also configure your API requests to use a specific model version.
When you submit documents to be processed, you can specify which of the supported languages they're written in. If you don't specify a language, extraction defaults to English. The API may return offsets in the response to support different multilingual and emoji encodings.
In version 2024-11-5-preview, you're able to define the redactionPolicy
parameter to reflect the redaction policy to be used when redacting text. The policy field supports three policy types:
DoNotRedact
MaskWithCharacter
(default)MaskWithEntityType
The DoNotRedact
policy allows the user to return the response without the redactedText
field, that is, “John Doe received a call from 424-878-9192”.
The MaskWithRedactionCharacter
policy allows the redactedText
to be masked with a character (such as "*"), preserving the length and offset of the original text, that is, “******** received a call from ************”. This is the existing behavior.
There's also an optional field called redactionCharacter
where you can input the character to be used in redaction if you're using the MaskWithCharacter
policy
The MaskWithEntityType
policy allows you to mask the detected PII entity text with the detected entity type, that is, “[PERSON_1] received a call from [PHONENUMBER_1]”.
Analysis is performed upon receipt of the request. Using the PII detection feature synchronously is stateless. No data is stored in your account, and results are returned immediately in the response.
When using this feature asynchronously, the API results are available for 24 hours from the time the request was ingested, and is indicated in the response. After this time period, the results are purged and are no longer available for retrieval.
The API attempts to detect the defined entity categories for a given document language. If you want to specify which entities are detected and returned, use the optional piiCategories
parameter with the appropriate entity categories. This parameter can also let you detect entities that aren't enabled by default for your document language. The following example would detect only Person
. You can specify one or more entity types to be returned.
Tip
If you don't include default
when specifying entity categories, The API only returns the entity categories you specify.
Input:
Note
In this example, it returns only the person entity type:
https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2022-05-01
{
"kind": "PiiEntityRecognition",
"parameters":
{
"modelVersion": "latest",
"piiCategories" :
[
"Person"
]
},
"analysisInput":
{
"documents":
[
{
"id":"1",
"language": "en",
"text": "We went to Contoso foodplace located at downtown Seattle last week for a dinner party, and we adore the spot! They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) and he is super nice, coming out of the kitchen and greeted us all. We enjoyed very much dining in the place! The pasta I ordered was tender and juicy, and the place was impeccably clean. You can even pre-order from their online menu at www.contosofoodplace.com, call 112-555-0176 or send email to order@contosofoodplace.com! The only complaint I have is the food didn't come fast enough. Overall I highly recommend it!"
}
]
},
"kind": "PiiEntityRecognition",
"parameters": {
"redactionPolicy": {
"policyKind": "MaskWithCharacter"
//MaskWithCharacter|MaskWithEntityType|DoNotRedact
"redactionCharacter": "*"
}
Output:
{
"kind": "PiiEntityRecognitionResults",
"results": {
"documents": [
{
"redactedText": "We went to Contoso foodplace located at downtown Seattle last week for a dinner party, and we adore the spot! They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is ********) and he is super nice, coming out of the kitchen and greeted us all. We enjoyed very much dining in the place! The pasta I ordered was tender and juicy, and the place was impeccably clean. You can even pre-order from their online menu at www.contosofoodplace.com, call 112-555-0176 or send email to order@contosofoodplace.com! The only complaint I have is the food didn't come fast enough. Overall I highly recommend it!",
"id": "1",
"entities": [
{
"text": "John Doe",
"category": "Person",
"offset": 226,
"length": 8,
"confidenceScore": 0.98
}
],
"warnings": []
}
],
"errors": [],
"modelVersion": "2021-01-15"
}
}
When you get results from PII detection, you can stream the results to an application or save the output to a file on the local system. The API response includes recognized entities, including their categories and subcategories, and confidence scores. The text string with the PII entities redacted is also returned.
For information on the size and number of requests you can send per minute and second, see the service limits article.
Events
Take the Microsoft Learn Challenge
Nov 19, 11 PM - Jan 10, 11 PM
Ignite Edition - Build skills in Microsoft Azure and earn a digital badge by January 10!
Register nowTraining
Module
Extract insights using Azure AI Language and Azure Database for PostgreSQL - Training
Learn to extract insights using Azure AI Language and Azure Database for PostgreSQL.
Certification
Microsoft Certified: Information Protection and Compliance Administrator Associate - Certifications
Demonstrate the fundamentals of data security, lifecycle management, information security, and compliance to protect a Microsoft 365 deployment.
Documentation
Quickstart: Detect Personally Identifying Information (PII) in text - Azure AI services
Use this quickstart to start using the PII detection API.
Learn about the entities the PII feature can recognize from unstructured text.
An overview of the PII detection feature in Azure AI services, which helps you extract entities and sensitive information (PII) in text.