A unified data governance solution that helps manage, protect, and discover data across your organization
Classification based on column pattern not applied consistently in Microsoft Purview
Hi,
We performed a few scans on our ADLS Gen 2 data source with custom scan rule set. The custom classification rules associated with it are based on column patterns, but they don't seem to work consistently. For example, if a column in the schema of the scanned files (blank parquet files in our case) in the ADLS contains substring xyz, it should match that and apply the classification on that column as we have added the column pattern value as just xyz in the rule. But it did not do so. So, I tried creating a new rule set with a single new classification rule containing a larger substring to be matched and scanned again. That worked.
So, I modified the previous rules to be similar in pattern to this rule which had worked but, this did not work even though it contained one classification rule the same as what had worked before. When I tried scanning with the rule set which had worked before again, it also did not work this time and classified nothing.
Also noticed another thing that even though while scanning, number of discovered assets and assets classified show up and are non-zero on data source details and scan details pages, nothing shows up in data estate insights section under Assets or Classified Assets. It shows either 0, blank or no available data there.
Are we missing something in regard to classification/scanning here or is this an inconsistency coming from Purview? How do we get it to work consistently?