Hi Pedro Simões,
It sounds like you have several detailed questions regarding custom classification rules in Microsoft Purview. Let’s break down your inquiries:
- Column Pattern Regex Evaluation: When applying custom classification rules, the regex evaluates against both the column name and the column display name. If they differ, both are checked to ensure maximum accuracy.
Minimum Match Threshold and Sparse Columns: You're correct that the minimum threshold for a data pattern match is typically set at 1%. Unfortunately, if the match rate falls below this threshold, the column cannot be classified automatically. A workaround could be to ensure that you have more populated rows in your data, as columns with a significant number of NULL values are less likely to be classified. One alternative approach is to manually apply the classification if you deem any populated cells as essential for classification, especially for PII.
- Unexpected Classification Behavior: The behavior you described regarding the column iss_gendername not being classified can be influenced by the column sharing the same Fully Qualified Name (FQN). This can indeed affect classification, as Purview may get confused due to duplicate FQNs. It’s worth checking how these columns are defined within Dataverse, as it may affect the recognition and classification during scans.
Here's a quick recap of steps you might consider:
- Double-check the regex patterns to ensure they are accurate and correctly formatted.
- Review the data in your columns to ensure they meet the distinct value requirements.
- Inspect if your scan rule set includes all necessary custom classifications.
- If issues persist, consider classifying columns manually where potential classification is known.
If you have any further questions or need clarification on any of these points, feel free to ask!
References:
- Classification and sensitivity labels - Missing or incorrectly classified assets
- Custom classifications
- Classification best practices in the Microsoft Purview governance portal
Hope this helps!