Cloud-scale analytics data management landing zone overview

The data management landing zone is a management function and is central to cloud-scale analytics. It's responsible for the governance of your analytics platform.

Diagram of data management landing zone overview.

Your data management landing zone is a separate subscription that has the same standard Azure landing zone services. It allows data governance of your data via crawlers, which connect to your data lakes and polyglot storage in your data landing zones. Virtual network peering connects your data management landing zone to your data landing zones and connectivity subscription.

Use this architecture as a starting point. Download the Visio file and modify it to fit your specific business and technical requirements when planning your data management landing zone implementation.

Note

Polyglot persistence is a storage term that describes your choice between different data storage/data stores technologies to support your various data types and their storage needs. Essentially, polyglot persistence is the concept that an application can use more than one core database or storage technology.

Important

Your data management landing zone must be deployed as a separate subscription under the corp management group of an Azure landing zone architecture. You can then control governance across your organization. The Azure landing zone accelerator illustrates how you should approach Azure landing zones.

Data catalog

Resource group: governance-rg

The data catalog registers and maintains the data information in a centralized place and makes it available for the organization. It ensures that enterprises avoid duplicate data products caused by redundant data ingestion by different project teams.

We recommend you provision a data catalog service to define the metadata of the data products stored across the data landing zones.

Cloud-scale analytics relies on Microsoft Purview to serve as:

  • A system of registration
  • A discovery for enterprise data sources
  • A data classification engine
  • A policy store
  • An API for registering and reading data information
  • A compliance dashboard hub

Because the data catalog is part of the data management landing zone, it can communicate with each data landing zone via its virtual network (VNet) peering and self-hosted integration runtimes. Discovery of data products in on-premises stores and other public clouds is achieved by more deployments of self-hosted integration runtimes.

Note

Although this documentation focuses primarily on using Microsoft Purview for data catalog capabilities and data classification, enterprises might have invested in other products, such as Alation, Okera, or Collibra. If this is the case, work with your vendor to apply the principles described for a data management landing zone as nearby as possible. Be aware that some custom integration might be required.

For more information, see Data catalog and Microsoft Purview deployment best practices for cloud-scale analytics.

Data quality management

Resource group: governance-rg2

Continue with your current solution.

You should manage data quality as close to your data source as possible so you avoid quality issues replicating across your analytics and AI estate. Moving quality metrics and validation to your data integration aligns the quality process with the teams that are closest to your data. These teams have the deepest understanding of your data asset.

Data lineage also provides data quality confidence, and you should provide it for all data products and products.

For more information on data quality management, see Data quality.

Data modeling repository

Resource group: governance-rg2

You should capture and store entity relationship models in a central location within your data management landing zone. This provides data consumers a single place to find conceptual diagrams.

Many customers use ER Studio and iServer to model their data products before ingestion.

Master data management

Resource group: governance-rg2

Master data management control resides in the data management landing zone. Master data management in data mesh contains specific considerations you should call out for data mesh.

Many master data management solutions fully integrate with Azure Active Directory. This integration allows you to secure your data and provide different views for different user groups.

For more information, see Master data management system.

API catalog

Resource group: governance-rg2

Your data application teams across will likely create various APIs for their data application. These APIs can be difficult to discover across your organization. Placing an API catalog in your data management landing zone can solve this problem.

An API catalog can help standardize your documentation and offers a place for internal collaboration on APIs. It also can drive consumption, publishing, and governance controls across your organization.

Data sharing and contracts

Resource group: governance-rg2

Cloud-scale analytics uses Azure AD entitlement management or Microsoft Purview policies to control access to data sharing. Even so, you might still require a sharing and contract repository. This repository is an organizational function and should reside in your data management landing zone.

Your contracts should provide information on data validation, models, and security policies.

For more information, see Data contracts

Azure Container Registry

Resource group: containers-rg

Your data management landing zone hosts an Azure Container Registry. The Azure Container Registry allows your data platform operations to deploy standard containers for use in data science projects that your data application teams consume.

Resource group: synapse-link-rg

Azure Synapse Analytics Private Link hubs are Azure resources that connect your secured network and the Azure Synapse Studio web experience. Cloud-scale analytics securely connects your Azure Virtual Network to Azure Synapse Studio using private links from these hubs.

There are two steps to connect to Azure Synapse studio using private links.

  1. Create a Private Link hub resource.
  2. Create a private endpoint from your Azure Virtual Network to that Private Link hub.

You can then use private endpoints to securely communicate with Azure Synapse studio. Integrate these private endpoints with your DNS solution, either with your on-premises solution or with Azure Private DNS.

For more information, see Connect to Azure Synapse studio using private links.

Automation interfaces (optional)

Your organization might decide to create many automation services to augment cloud-scale analytics capabilities. These automation services drive conformity and onboarding solutions for your analytics state.

If you decide to build these automation services, you should have a user interface that acts as both a data marketplace and an operation console. This interface should rely on an underlying metadata store like we've previously discussed in Metadata standards.

Your data marketplace or operations console calls a middle tier of microservices to facilitate onboarding, metadata registration, security provisioning, data lifecycle, and observability.

You can provision the automationdb-rg resource group to host your metadata store.

Important

None of these automation services are products, and they do not illustrate any roadmap item. They are listed to help you consider which items you might want to automate.

Services

Service Service Scope
Data landing zone provisioning This service creates a new data landing zone. It's unlikely to have a high usage, but is included for end-to-end onboarding solution completeness. For more information, see Provision the cloud-scale analytics
Data product onboarding This service creates and amends resource groups that pertain to an onboarded tenant. It also contains capabilities to upgrade and degrade SKUs and to activate and deactivate resource groups for any onboarded tenant or service. It creates a new data landing zone DevOps. For more information, see Provision the cloud-scale analytics
Access provisioning This service creates access packages, access policies, and asset access approval process (manual or automatic) using SPN/UPN. It can also expose an API to provide a list of subscription requests (assets) that users have submitted in the past 90 days. For more information, see Data access management
Data agnostic ingestion This microservice creates new data sources for ingestion into your data landing zones. It does this by communicating with an Azure Data Factory SQL Database metastore in each data landing zone. For more information, see How automated ingestion frameworks support cloud-scale analytics in Azure
Metadata This service exposed and creates metadata for the platform. For more information, see Metadata standards
Data lifecycle This service is responsible for maintaining your data lifecycle based on metadata. This maintenance can include moving data to cold storage and deleting records that no longer need to be retained. For more information, see Data lifecycle management
Data domain onboarding ONLY APPLICABLE TO DATA MESH. This service captures metadata pertaining to new domains and onboards the new domains as needed. It can also create, update, activate, and deactivate any domain or service line you might build into a microservice. For more information, see Provision the cloud-scale analytics

Data standardization

Although it isn't a specific feature or product of your data management landing zone, you should call out data standardization across all services. Data standardization defines the format in which your data should land and be stored.

Tip

Use delta-lake format wherever possible as the defacto standard across all services and storage.

For more information, see Data standardization.

Next steps