What is the logic behind detecting the PII for country spefic ?

Mohsin Khan 60 Reputation points
2025-05-07T18:23:02.7+00:00

I am implementing DLP using Azure AI Langugae PII detection service, with that there are several categories and country.

For Example with Philipness below ID (Unified Multi-Purpose ID) format is valid and will be detected as PII.

1234-5678901-2

Though each part is explained below

Structure:

First 4 digits: Agency prefix (e.g., SSSS, GSIS)

Next 7 digits: Unique serial number

Last digit: Check digit (computed via Luhn or custom checksum algorithm)

But if I use the format based approach for Portugual(Citizen Card Number) is not detecting utill i need to calculate the check sum based on there logic.

Example :

12345678 4 AB1 -Will not work

42634925 3 ZY8 will work.

how the service detect the PII does it apply the format/regex based or checksum calculation ?

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
523 questions
0 comments No comments
{count} votes

Accepted answer
  1. Saideep Anchuri 9,500 Reputation points Moderator
    2025-05-09T02:02:36.4466667+00:00

    Hi there Mohsin Khan

    Some of these categories do follow specific checksum or validation formats.

    Category Uses Format/Checksum?
    ABA routing number Yes
    IBAN (International Banking Account Number) Yes
    Argentina National Identity (DNI) Number Yes
    Austria identity card No
    Austria tax identification number Yes
    Austria VAT number Yes
    Australia bank account number Yes
    Australian business number Yes
    Australia Company Number Yes
    Australia driver's license No
    Australia medical account number No
    Australia passport number No
    Australia tax file number Yes
    Belgium national number Yes
    Belgium VAT number Yes
    Brazil legal entity number (CNPJ) Yes
    Brazil CPF number Yes
    Brazil National ID Card (RG) No
    Canada bank account number Yes
    Canada driver's license number No
    Canada health service number No
    Canada passport number No
    Canada Personal Health Identification Number (PHIN) No
    Canada social insurance number Yes
    Chile identity card number Yes
    China Resident Identity Card (PRC) number Yes
    EU debit card number Yes
    EU driver's license number No
    EU GPS coordinates No
    EU national identification number Varies by country
    EU passport number No
    EU Social Security Number (SSN) or equivalent ID Varies by country
    EU Tax Identification Number (TIN) Varies by country
    France driver's license number No
    France health insurance number No
    France national ID card (CNI) Yes
    France passport number No
    France Social Security Number (INSEE) Yes
    France tax identification number (Numéro SPI) Yes
    France VAT number Yes
    German Driver's License Number No
    Germany Identity Card Number Yes
    Germany passport number No
    Germany Tax Identification Number Yes
    Germany VAT Number Yes
    Hong Kong Identity Card (HKID) Number Yes
    Hungary Personal Identification Number Yes
    Hungary Tax Identification Number Yes
    Hungary VAT Number Yes
    India Permanent Account Number (PAN) Yes
    Indonesia Identity Card (KTP) Number Yes
    Ireland Personal Public Service (PPS) Number Yes
    Israel National ID Yes
    Israel Bank Account Number Yes
    Italy Driver's License ID No
    Italy Fiscal Code Yes
    Italy VAT Number Yes
    Japan Bank Account Number Yes
    Japan Driver's License Number No
    Japan "My Number" (personal) Yes
    Japan "My Number" (Corporate) Yes
    Japan Resident Registration Number Yes
    Japan Residence Card Number Yes
    Japan Social Insurance Number (SIN) Yes
    Japan Passport Number No
    Luxembourg National Identification Number (Natural persons) Yes
    Luxembourg National Identification Number (Non-natural persons) Yes
    Malta Identity Card Number Yes
    Malta Tax Identification Number Yes
    New Zealand Bank Account Number Yes
    New Zealand Driver's License Number No
    New Zealand Inland Revenue Number Yes
    New Zealand Ministry of Health Number No
    New Zealand Social Welfare Number No
    Philippines Unified Multi-Purpose ID Number Yes
    Portugal Citizen Card Number Yes
    Portugal Tax Identification Number Yes
    Singapore National Registration ID Card (NRIC) Number Yes
    South Africa Identification Number Yes
    South Korea Resident Registration Number Yes
    Spain DNI Yes
    Spain Social Security Number (SSN) Yes
    Spain Tax Identification Number Yes
    Swiss Social Security Number (AHV) Yes
    Taiwan National ID Yes
    Taiwan Resident Certificate (ARC/TARC) Yes
    Taiwan Passport Number No

    Thank You.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Azar 29,520 Reputation points MVP Volunteer Moderator
    2025-05-07T20:37:39.2766667+00:00

    Hi there Mohsin Khan

    thanks for using QandA platform

    • Regex & format matching is the first layer this catches most PII patterns that follow known structures.
    • Checksum validation is applied only for countries and PII types where the structure alone isn't unique or reliable enough (like Portugal Citizen Card Numbers etc.).
    • If the checksum fails or is missing, the PII entity may be downgraded in confidence.

    In your Portugal example, the PII won't be flagged unless it passes the Portuguese ID checksum validation. That’s why 42634925 3 ZY8 works and the other doesn’t.

    So yes it’s not just regex, but also country-specific logic, including checksums or control digit verification where applicable

    If this helps kindly accept the answer thanks .


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.