Share via

Purview: Correct Resource Set with Pattern Rule

JF 96 Reputation points
2023-04-04T14:07:09.7433333+00:00

While performing a scan with Purview on our data lakehouse, certain assets are automatically detected being part of a resource set. Most of the times this goes well, but we have a specific scenario. In the data lake, there are the following folders 1__RAW_, 2_STAGE, and 3_CURATED. These are detected as being part of a resource set, for example as such with the following fully qualified name: https://**.net/datalakehouse/{N}_STAGE/DATABASE/DATABASENAME/file.parquet/{SparkPartitions} The indication {SparkPartitions} in this scenario is correct, but {N}STAGE should be indicated as 2STAGE. I understand that with the custom pattern rules, you can correct the default behavior, but I cannot seem to find the correct solution. For example, I have attempted the following custom pattern rule with qualified name: {:int}_STAGE/* with the setting to not group into a resource set, but I cannot get to the desired result. Could someone provide input on the correct syntax? Thanks in advance.

Microsoft Security | Microsoft Purview
0 comments No comments

Answer accepted by question author

Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
2023-04-05T20:28:16.07+00:00

Hello JF, My understanding is that you're trying to create a custom pattern rule to correct the default behavior of resource set detection in Purview. Based on your description, you want to match the 2_STAGE folder specifically and not group it into a resource set. Can you please try the following custom pattern rule? Scope: https://**.net/datalakehouse/ Qualified Name: 2_STAGE/{{DATABASE:string}}/{{DATABASENAME:string}}/file.parquet/{{SparkPartitions:string}} Do not group as resource set: Enabled--

This rule should match the 2_STAGE folder specifically and prevent it from being grouped into a resource set. Make sure to replace DATABASE, DATABASENAME, and SparkPartitions with the appropriate values or replacer types based on your data lake structure Reference document: https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/purview/how-to-resource-set-pattern-rules.md {N} is replaced with the number 2, which corresponds to the "2_STAGE" folder in your data lake

I hope this helps!

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

1 additional answer

Sort by: Most helpful
  1. JF 96 Reputation points
    2023-04-06T14:27:31.2466667+00:00

    Thank you for your feedback. I could get to the desired result. In the end, I used a slightly different approach: Qualified Name: {{N:int}}_STAGE/{{folder:string}}/{{foldername:string}}/{{file:string}}.parquet/{SparkPartitions:string} with the Do not group as resource set disabled. This worked fine for me. Thanks!

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.