columns description in datacatalog

Prasad Lotankar 1 Reputation point
2022-09-20T09:22:05.937+00:00

can we add datacatalog columns description automatically through python. suppose there are 100-200 columns i want to add description for those columns is there a way to automate it?

Azure Data Catalog
Azure Data Catalog
An Azure service that serves as a system of registration and system of discovery for enterprise data assets.
103 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,221 questions
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,512 Reputation points Microsoft Employee
    2022-09-20T22:39:11.853+00:00

    Hello @Prasad Lotankar ,

    Thanks for the question and using MS Q&A platform.

    My understanding is that you would like to know how to bulk update the column descriptions in Microsoft Purview using Python. Please correct if my understanding is not clear.

    Option1: The available option is by leveraging PyApacheAtlas (https://github.com/wjohnson/pyapacheatlas) as described here: https://github.com/wjohnson/pyapacheatlas/wiki/Azure-Purview-Tips#update-an-existing-column

    Option 2: PyApacheAtlas is a very convenient way of just getting massive amounts of data into Purview via an Excel interface and a little bit of Python!

    Here is a video on how to do bulk uploads with an excel file: https://www.youtube.com/watch?v=27jRUydL6qE

    You'll need the following:

    1. Install PyApacheAtlas and either get a service principal access to Purview or use the Azure CLI
    2. Generate the excel template using the python -m pyapacheatlas --make-template ./purview.xlsx command
    3. You will add the information to the BulkEntities tab.
      • Each row of the spreadsheet represents one column
      • Need the qualified name of the columns (e.g. mssql://server/db/schema/table#columnName)
      • Need the type name of the columns (e.g. azure_sql_column if it’s a column on an azure sql table)
    4. For the bulk entities spreadsheet, you’ll need to add a column called “description” and fill in the descriptions for the given column.
    5. you can then modify and run the sample script: https://github.com/wjohnson/pyapacheatlas/blob/master/samples/excel/excel_bulk_entities_upload.py
      • Please note that you should comment out lines 76 and 78 and update the file_path variable

    Hope this helps.


    • Please don't forget to click on 130616-image.png and upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  2. Bas Pruijn 951 Reputation points
    2022-10-08T15:14:56.05+00:00

    For bulk updates I used this sample code to create an export. I updated the export using whatever tool I liked and then performed an import again. It would be usefull to have a look at this approach.

    according to this sample https://github.com/pietrobr/azure-data-catalog-rest-python you can also directly access the data catalog from python. Unfortunately I do not have hands-on experience with that.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.