Cloud-scale analytics

With larger, more sophisticated forms of cloud adoption, your journey to the cloud becomes more complex. Azure cloud-scale analytics is a scalable, repeatable framework that meets your organization's unique needs for building modern data platforms.

Cloud-scale analytics covers both technical and non-technical considerations for analytics and governance in the cloud. This guidance strives to support hybrid and multicloud adoption by being cloud agnostic, but the included technical implementation examples focus on Azure products.

Cloud-scale analytics has the following goals:

  • Serve data as a product, rather than a byproduct
  • Provide an ecosystem of data products, rather than a singular data warehouse that might not best fit your data scenario
  • Drive a default approach to enforce data governance and security
  • Drive teams to consistently prioritize business outcomes instead of focusing just on the underlying technology.

Cloud-scale analytics builds upon Microsoft's cloud adoption framework and requires understanding of landing zones. If you don't already have an implementation of Azure landing zones, consult your cloud teams about how to meet prerequisites. For more information, see Ensure the environment is prepared for the cloud adoption plan.

Reference architectures allow you to begin with a small footprint and grow over time, adapting the scenario to your use cases.

Cloud-scale analytics includes repeatable templates that accelerate five core infrastructure and resource deployments. It's also adaptable for different organization sizes. If you're a small enterprise with limited resources, a centralized operations model mixed with some business subject matter experts might fit your situation. If you're a larger enterprise with autonomous business units (each with their own data engineers and analysts) as your goal, then a distributed operating model such as data mesh or data fabric might better address your needs.

Objectives

Cloud-scale analytics provides a framework that is built on the following principles. These principles address challenges with complex data architectures that don't scale to the needs of organizations.

Principle Description
Allow
  • Scaling without increased complexity
  • Separation of concerns to facilitate governance
  • Creation of self-serve data infrastructure
Follow
  • Best practices for well-architected cloud services
Support
  • On-premises and multicloud scenarios
Adopt
  • Product and vendor agnostic approach
  • Cloud Adoption Framework
Commit
  • Azure landing zones as baseline infrastructure for all workloads
  • Operating model
Enable
  • Common data infrastructure
  • Distributed architecture under centralized governance
  • Secure network line-of-sight

Implementation guidance

Implementation guidance can be broken into two sections:

  • Global guidance that applies to all workloads.
  • Cloud-scale specific guidance

Global guidance

Documentation Description
Cloud Adoption Framework Managing and governing data is a lifecycle process, which begins by building on your existing cloud strategy and carries all the way through to your ongoing operations. The Cloud Adoption Framework helps guide your data estate's full lifecycle.
Azure Well-Architected Framework Workload architecture and operations have a direct effect on data. Understand how your architecture can improve your management and governance of workload data.

Cloud-scale specific guidance

Section Description
Build an Initial Strategy How to build your data strategy and pivot to become a data driven organization.
Define your plan How to develop a plan for cloud-scale analytics.
Prepare analytics estate Overview of data management and data landing zones with key design area considerations like enterprise enrollment, networking, identity and access management, policies, business continuity and disaster recovery.
Govern your analytics Requirements to govern data, data catalog, lineage, master data management, data quality, data sharing agreements and metadata.
Secure your analytics estate How to secure analytics estate with authentication and authorization, data privacy, and data access management.
Organize people and teams How to organize effective operations, roles, teams, and team functions.
Manage your analytics estate How to provision platform and observability for a scenario.

Architectures

This section addresses the details of physical implementations of cloud-scale analytics. It maps out the physical architectures of data management landing zones and data landing zones.

Cloud-scale analytics has two key architectural concepts:

  • The data landing zone
  • The data management landing zone

These architectures standardize best practices and minimize deployment bottlenecks for your development teams, and can accelerate the deployment of common cloud-scale analytics solutions. You can adopt their guidance for lakehouse and data mesh architectures. That guidance highlights the capabilities you need for a well-governed analytics platform that scales to your needs.

The following diagram provides an overview of a data platform that contains a central data management landing zone and multiple data landing zones.

Diagram of a high-level design containing both a data management landing zone and data landing zones.

You can start with a single landing zone and scale to multiple landing zones, and govern all of them from the data management landing zone.

For more information, see: Architectures Overview

Deployment templates

This section includes many reference templates that can be deployed.

Repository Content Required Deployment model
Data management template Central data management services and shared data services like data catalog and self-hosted integration runtime Yes One per cloud-scale analytics
Data landing zone template Data landing zone shared services, including ingestion, management, and data storage services Yes One per data landing zone
Data integration template - batch processing Additional services necessary for batch data processing No One or more per data landing zone
Data integration template - stream processing Additional services necessary for data stream processing No One or more per data landing zone
Data product template - analytics and data science Additional services necessary for data analytics and AI No One or more per data landing zone

These templates contain Azure Resource Manager templates, the templates' parameter files, and CI/CD pipeline definitions for resource deployment.

Templates can change over time due to new Azure services and requirements. Secure each repository's main branch so it remains error-free and ready for consumption and deployment. Use a development subscription to test template configuration changes before you merge feature enhancements back into your main branch.

For more information, see Deployment templates.

Solution accelerators

Solution Accelerators are open-source projects on GitHub. These repositories contain resources and information that simplify and accelerate your ability to solve problems using technology.

For more information, see Solution accelerators.

Best practices

The following advanced, level-300+ articles in the cloud-scale analytics table of contents can help central IT teams deploy tools and manage processes for data management and governance:

Expand the Featured Azure products section in the cloud-scale analytics table of contents to learn about the Azure products that support cloud-scale analytics.

Common customer journeys

The following common customer journeys support cloud-scale analytics:

  • Prepare your environment. Use the Prepare your environment articles as resources. Establish processes and approaches that support the entire portfolio of workloads across your data estate.

  • Improve controls across your data estate. Focus on the Govern your data estate and Secure your data estate articles to integrate cloud-scale analytics into your existing operations.

  • Influence changes to individual workloads. As your cloud-scale analytics processes improve, your central data governance teams will find requirements that depend on knowledge of the architecture behind individual workloads. Use the Architecture articles to understand how you can use the scenarios within for your use case.

  • Optimize individual workloads and workload teams. Start with the Azure Well-Architected Framework guidance to integrate cloud-scale analytics strategies into individual workloads. This guidance describes best practices and architectures that central IT and governance teams should use to accelerate individual workload development.

  • Use best practices to onboard individual assets. Expand the Best practices section in the cloud-scale analytics table of contents to find articles about processes for onboarding your entire data estate into one cloud-scale analytics control plane.

  • Use specific Azure products. Accelerate and improve your cloud-scale analytics capabilities by using the Azure products in the Featured Azure products section of the cloud-scale analytics table of contents.

Take action

For more information about planning for implementing the cloud-scale analytics, see:

Next steps

Begin your cloud-scale analytics journey: