DLP for South African ID Number

Lethukuthula Shabalala 21 Reputation points
2024-11-20T10:21:04.3266667+00:00

Hi All,

I need assistance in creating a functional Sensitive Info Type (SIT) to prevent the unintended or unauthorized sharing of South African ID numbers. I’ve tried using the existing SIT, but it doesn't detect the ID numbers during testing. Additionally, I attempted to create a custom SIT using a regular expression, and although I've found some expressions that are typically used to validate South African ID numbers, they aren't working as expected. The regular expressions work fine on https://regex101.com, but when I apply them in the SIT configuration, they fail.

Any guidance would be greatly appreciated!

Microsoft 365
Microsoft 365
Formerly Office 365, is a line of subscription services offered by Microsoft which adds to and includes the Microsoft Office product line.
5,253 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,260 questions
{count} votes

Accepted answer
  1. phemanth 11,975 Reputation points Microsoft Vendor
    2024-11-27T15:01:07.52+00:00

    @Lethukuthula Shabalala

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: I need assistance in creating a functional Sensitive Info Type (SIT) to prevent the unintended or unauthorized sharing of South African ID numbers. I’ve tried using the existing SIT, but it doesn't detect the ID numbers during testing. Additionally, I attempted to create a custom SIT using a regular expression, and although I've found some expressions that are typically used to validate South African ID numbers, they aren't working as expected. The regular expressions work fine on https://regex101.com, but when I apply them in the SIT configuration, they fail.

    **Solution:**After further investigation, I discovered that Microsoft Purview utilizes a subset of .NET regex, which can cause certain features to behave differently. For instance, I had to avoid using \b (word boundary) and instead opted for (?<!\d) as an opener and (?!\d) as a closer for the regex.

    Here’s an example comparing the original regex we use on Mimecast and the modified version for Purview:

    Original Regex (Mimecast):

    \b(([0-57-9]\d(0[1-9]|1[012]))|(61-9)|(601[012]))(0[1-9]|[12][0-9]|3[01])[ -]?\d\d\d\d[ -]?\d\d\d\b

    Modified Regex (Purview):

    (?<!\d)(([0-57-9]\d(0[1-9]|1[0-2]))|(61-9)|(601[0-2]))(0[1-9]|[12][0-9]|3[01])[ -]?\d{4}[ -]?\d{3}(?!\d)

    The modified regex successfully worked in my SIT environment and can distinguish between valid and invalid ID numbers.

    Thank you once again, and I appreciate the community's continued support.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.