Microsoft Purview automation best practices

While Microsoft Purview governance solutions provide an out of the box user experience with the Microsoft Purview governance portal, not all tasks are suited to the point-and-click nature of the graphical user experience.

For example:

  • Triggering a scan to run as part of an automated process.
  • Monitoring for metadata changes in real time.
  • Building your own custom user experience.

Microsoft Purview provides several tools in which we can use to interact with the underlying platform, in an automated, and programmatic fashion. Because of the open nature of the Microsoft Purview service, we can automate different aspects, from the control plane, made accessible via Azure Resource Manager, to Microsoft Purview's multiple data planes (catalog, scanning, administration, and more).

This article provides a summary of the options available, and guidance on what to use when.

Tools

Tool Type Tool Scenario Management Catalog Scanning Logs
Resource Management Infrastructure as Code
Command Line Interactive
Command Line Interactive
API On-Demand
Streaming (Apache Atlas) Real-Time
Monitoring Monitoring
SDK Custom Development

Resource Management

Azure Resource Manager is a deployment and management service, which enables customers to create, update, and delete resources in Azure. When deploying Azure resources repeatedly, ARM templates can be used to ensure consistency, this approach is referred to as Infrastructure as Code.

To implement infrastructure as code, we can build ARM templates using JSON or Bicep, or open-source alternatives such as Terraform.

When to use?

  • Scenarios that require repeated Microsoft Purview deployments, templates ensure Microsoft Purview along with any other dependent resources are deployed in a consistent manner.
  • When coupled with deployment scripts, templated solutions can traverse the control and data planes, enabling the deployment of end-to-end solutions. For example, create a Microsoft Purview account, register sources, trigger scans.

Command Line

Azure CLI and Azure PowerShell are command-line tools that enable you to manage Azure resources such as Microsoft Purview. While the list of commands will grow over time, only a subset of Microsoft Purview control plane operations is currently available. For an up-to-date list of commands currently available, check out the documentation (Azure CLI | Azure PowerShell).

  • Azure CLI - A cross-platform tool that allows the execution of commands through a terminal using interactive command-line prompts or a script. Azure CLI has a purview extension that allows for the management of Microsoft Purview accounts. For example, az purview account.
  • Azure PowerShell - A cross-platform task automation program, consisting of a set of cmdlets for managing Azure resources. Azure PowerShell has a module called Az.Purview that allows for the management of Microsoft Purview accounts. For example, Get-AzPurviewAccount.

When to use?

  • Best suited for ad-hoc tasks and quick exploratory operations.

API

REST APIs are HTTP endpoints that surface different methods (POST, GET, PUT, DELETE), triggering actions such as create, read, update, or delete (CRUD). Microsoft Purview exposes a large portion of the Microsoft Purview platform via multiple service endpoints.

When to use?

  • Required operations not available via Azure CLI, Azure PowerShell, or native client libraries.
  • Custom application development or process automation.

Streaming (Apache Atlas)

Each Microsoft Purview account can configure Event Hubs that are accessible via their Atlas Kafka endpoint.

You can follow these steps to configure the Event Hubs namespaces.

Note

Enabling this Event Hubs namespace does incur a cost for the namespace. For specific details, see the pricing page.

Once the namespace is enabled, Microsoft Purview events can be monitored by consuming messages from the event hub. External systems can also use the event hub to publish events to Microsoft Purview as they occur.

  • Consume Events - Microsoft Purview will send notifications about metadata changes to Kafka topic ATLAS_ENTITIES. Applications interested in metadata changes can monitor for these notifications. Supported operations include: ENTITY_CREATE, ENTITY_UPDATE, ENTITY_DELETE, CLASSIFICATION_ADD, CLASSIFICATION_UPDATE, CLASSIFICATION_DELETE.
  • Publish Events - Microsoft Purview can be notified of metadata changes via notifications to Kafka topic ATLAS_HOOK. Supported operations include: ENTITY_CREATE_V2, ENTITY_PARTIAL_UPDATE_V2, ENTITY_FULL_UPDATE_V2, ENTITY_DELETE_V2.

When to use?

  • Applications or processes that need to publish or consume Apache Atlas events in real time.

Monitoring

Microsoft Purview can send platform logs and metrics via "Diagnostic settings" to one or more destinations (Log Analytics Workspace, Storage Account, or Azure Event Hubs). Available metrics include Data Map Capacity Units, Data Map Storage Size, Scan Canceled, Scan Completed, Scan Failed, and Scan Time Taken.

Once configured, Microsoft Purview automatically sends these events to the destination as a JSON payload. From there, application subscribers that need to consume and act on these events can do so with the option of orchestrating downstream logic.

When to use?

  • Applications or processes that need to consume diagnostic events.

SDK

Microsoft provides Azure SDKs to programmatically manage and interact with Azure services. Microsoft Purview client libraries are available in several languages (.NET, Java, JavaScript, and Python), designed to be consistent, approachable, and idiomatic.

When to use?

  • Recommended over the REST API as the native client libraries (where available) will follow standard programming language conventions in line with the target language that will feel natural to the developer.

Next steps