Based on Microsoft's official documentation for Sensitive Information Types (SIT),
Detection of U.S. Social Security Numbers (SSNs) in Microsoft Purview DLP depends heavily on the confidence level and specific formatting and proximity conditions.
Why your SSNs may not be detected under Medium confidence
Your DLP policy uses the Medium confidence level, which requires - A match to the Func_unformatted_ssn
(i.e., 9 consecutive digits). And a keyword (from Keyword_ssn
) to appear within 300 characters of the number.
So, formats like 123456789
or 123-45-6789
will not be detected under Medium confidence unless a keyword like “SSN” or “Social Security” appears nearby in the body or attachment.
Also, numbers like 888-88-8888
and 987-65-4321
are commonly used test SSNs and may be deliberately excluded from detection by Microsoft’s default patterns.
Options for better detection:
If you want to detect SSNs without needing nearby keywords, consider changing the policy to Low confidence. This level may detect SSNs based only on their pattern match, but it can result in more false positives.
If you're concerned about strict detection, raising the level to High confidence won’t help in this case—it requires more conditions to be met (e.g., stricter pattern + keyword).
For more control, you can also create a custom sensitive information type (SIT) to match specific SSN formats and decide whether to include or exclude proximity keyword requirements.
Note: Your current policy works as designed but misses some test data due to formatting and keyword proximity conditions at the Medium level. Switching to Low confidence or defining a custom SIT would help in detecting more variations of SSNs.
I hope this information helps.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.