Removing all applied system classifications from purview
Hi,
System classifications are applied to all assets on first scan in MS-Purview. Now I want to remove all system classifications as they are not perfect and doesn't match our needs aswell. I have some custom classification rules, which are working fine, however I can't find any option to delete all system classifications that are applied to the assets.
I know I can delete them individually through UI, or through Purview-API but its cumbersome to do so for more than 1k assets. I also tried doing a custom full-scan to exclude the system classifications, however they still exist (maybe due to the previous scans). There has to be a way to disable system classifications and remove them for all assets that already have them.
Azure Data Catalog
Microsoft Purview
-
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-06T19:12:23.05+00:00 Hello ShadowWalker,
Welcome to the Microsoft Q&A forum.
I am contacting my internal team to see if there is a way to disable system classifications and remove them for all assets that already have them. I will get back to you when I hear something from them.
-
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-08T22:40:44.72+00:00 Hello ShadowWalker,
<update>
You can use the below Python script to achieve this.
from azure.identity import ClientSecretCredential import json import requests client_id = "<client_id>" client_secret = "<client_secret>" tenant_id = "<tenant_id>" reference_name_purview = "<purview_name>" purview_base_url = f"https://{reference_name_purview}.purview.azure.com/catalog/api" classification_name = "<classification_name>" # Search for assets with the specified classification search_url = f"{purview_base_url}/search/query?api-version=2021-05-01-preview" offset = 0 limit = 1000 def get_credentials(): credentials = ClientSecretCredential( client_id=client_id, client_secret=client_secret, tenant_id=tenant_id) access_token = credentials.get_token( "https://purview.azure.net/.default").token return access_token def find_key_with_guid(relationshipAttributes): for key in relationshipAttributes: print(f"key: {key}") if isinstance(relationshipAttributes[key], list): for item in relationshipAttributes[key]: print(f"item: {item}") if isinstance(item, dict) and "guid" in item: return key if __name__ == "__main__": # Main script access_token = get_credentials() headers = { "Content-Type": "application/json", "Authorization": f"Bearer {access_token}" } while True: search_payload = { "keywords": "*", "limit": limit, "offset": offset, "filter": { "and": [ { "classification": classification_name } ] } } search_results = requests.post( search_url, json=search_payload, headers=headers) entities = search_results.json().get("value", []) print('search_results') print(entities) if not entities or search_results.status_code != 200: break # Remove classification assignment from all assets for asset in entities: guid = asset["id"] print(f"Work on guid={guid}") delete_relationship = requests.delete( f"{purview_base_url}/atlas/v2/entity/guid/{guid}/classification/{classification_name}", headers=headers) print('delete_relationship') print(delete_relationship.json()) if delete_relationship.json()["errorCode"]: print( f"Failed to remove classification from asset {guid} because classification is on schema") # Get the asset schema GUID via bulk API schema_guid_request = requests.get( f"{purview_base_url}/atlas/v2/entity/bulk?guid={guid}", headers=headers) relationshipAttributes = schema_guid_request.json( )["entities"][0]["relationshipAttributes"] # print('relationshipAttributes') # print(json.dumps(relationshipAttributes, indent=4)) column_guids = [] if relationshipAttributes.get("tabular_schema"): schema_guid = relationshipAttributes.get( "tabular_schema").get("guid") print(f"{guid} has tabular_schema: {schema_guid}") # Use bulk API on the schema GUID to retrieve column GUIDs column_guid_response = requests.get( f"{purview_base_url}/atlas/v2/entity/bulk?guid={schema_guid}", headers=headers) # print('column_guid_response.json()') # print(json.dumps(column_guid_response.json()["entities"][0]["relationshipAttributes"]["columns"], indent=4)) column_guids = [column["guid"] for column in column_guid_response.json( )["entities"][0]["relationshipAttributes"]["columns"]] else: print( f"{guid} has no tabular_schema. Will search for key with array of guid.") # Find the key that contains the column GUIDs column_guid_key = find_key_with_guid( relationshipAttributes) print(f"column_guid_key: {column_guid_key}") if not column_guid_key: print( f"ERROR: Could not find column GUIDs for asset {guid}") break print(f"{column_guid_key} has array of guid") columns = relationshipAttributes.get(column_guid_key) print(f"columns: {columns}") column_guids = [column["guid"] for column in columns] # Delete classification on the columns for column_guid in column_guids: print( f"Deleting classification {classification_name} from column {column_guid} if exists") requests.delete( f"{purview_base_url}/atlas/v2/entity/guid/{column_guid}/classification/{classification_name}", headers=headers) # 5. Delete the classification itself delete_classification_def_url = f"{purview_base_url}/atlas/v2/types/typedef/name/{classification_name}" requests.delete(delete_classification_def_url, headers=headers)
I hope this helps.
If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.
-
ShadowWalker 90 Reputation points
2023-09-11T10:39:14.75+00:00 Hi @Bhargava-MSFT ,
Thanks for your response. I will not prefer to delete all classifications through code, expecially by pulling all the GUIDs. I need to disable and delete all system classifications and not the custom ones. There should be a feature from within Purview portal.
-
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-11T21:09:37.2+00:00 Hello ShadowWalker,
I understand your concern about deleting all classifications through code. However, there is currently no feature in the Purview portal to disable or delete all system classifications.
As you know, the only way to remove system classifications from all assets is to delete them individually through the UI or Purview API.
If you want to remove only system classifications and not custom ones, you can modify the Python script to filter out custom classifications before deleting them.
<Please note: I have used AI to generate the below script>
from azure.identity import ClientSecretCredential import json import requests # Replace these variables with your actual values client_id = "<client_id>" client_secret = "<client_secret>" tenant_id = "<tenant_id>" reference_name_purview = "<purview_name>" purview_base_url = f"https://{reference_name_purview}.purview.azure.com/catalog/api" def get_access_token(): credentials = ClientSecretCredential( client_id=client_id, client_secret=client_secret, tenant_id=tenant_id) access_token = credentials.get_token("https://purview.azure.net/.default").token return access_token if __name__ == "__main__": access_token = get_access_token() # Define authentication headers headers = { "Content-Type": "application/json", "Authorization": f"Bearer {access_token}" } try: # Get all classifications classifications_url = f"{purview_base_url}/atlas/v2/types/typedefs/classificationdef" classifications_response = requests.get(classifications_url, headers=headers) classifications_response.raise_for_status() # Check for HTTP errors classifications = classifications_response.json() # Filter out system classifications system_classifications = [c for c in classifications if c["origin"] == "SYSTEM"] # Delete system classifications for classification in system_classifications: classification_name = classification["name"] delete_classification_def_url = f"{purview_base_url}/atlas/v2/types/typedef/name/{classification_name}" delete_response = requests.delete(delete_classification_def_url, headers=headers) if delete_response.status_code == 204: print(f"Deleted system classification: {classification_name}") else: print(f"Failed to delete system classification: {classification_name}") print("System classifications have been deleted.") except Exception as e: print(f"An error occurred: {str(e)}")
I hope this helps. Please let me know if you have any further questions.
-
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-13T15:55:41.2266667+00:00 Hello ShadowWalker,
I am checking to see if the above answer is helpful here.
-
ShadowWalker 90 Reputation points
2023-09-18T13:44:56.0666667+00:00 Hi @Bhargava-MSFT I already had AI-generated code for this matter which is mostly the same as yours, the code is not completely correct and requires some alterations too. However the answer is not useful for me as from my perspective:
If MSFT Purview allows us to choose what classifications should be applied to all assets while performing a full-scan, it should also allow us to discard previously applied classifications especially the system-classifications which are automatically included in initial scan. Secondly, I will repeat that its not safe at all to interact with API to delete all classifications especially using code-snippet mentioned above as it can cause wrong deletions in production. MSFT should enhance purview's capabilities if they need to provide a proper data-catalog platform i.e. instead of introducing new features, complete the existing features that are already provided.
I am not even talking about git-integration for storing the current state of purview as it deletes all user provided information for an asset if something changes in the source of asset. But this backup state of purview is not related to the question asked so I will drop it in this thread.
-
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-18T23:25:42.2533333+00:00 Hello ShadowWalker,
Thanks for your feedback.
I recommend submitting this feedback in the Purview IDEAS forum here: https://feedback.azure.com/d365community/forum/82d7bddb-fb24-ec11-b6e6-000d3a4f07b8
All the feedback shared in this forum is actively monitored and reviewed by respective product owners. Please share the feedback link once it is posted so that I can up-vote it, and I will pass the feedback to respective product owners.
-
ShadowWalker 90 Reputation points
2023-09-19T11:54:35.66+00:00 Hello @Bhargava-MSFT
Here is the link to the feedback:
https://feedback.azure.com/d365community/idea/21c3c030-e356-ee11-a81c-000d3ae37d1e -
Bhargava-MSFT 31,116 Reputation points • Microsoft Employee
2023-09-19T19:43:46.98+00:00 Hello ShadowWalker,
Thank you for submitting the feedback item. I will share this with the respective product owners.
Sign in to comment