Events
May 19, 6 PM - May 23, 12 AM
Calling all developers, creators, and AI innovators to join us in Seattle @Microsoft Build May 19-22.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Security is a one of the foundational design principles and also a key design area that must be treated as a first-class concern within the mission-critical architectural process.
Given that the primary focus of a mission-critical design is to maximize reliability so that the application remains performant and available, the security considerations and recommendations applied within this design area will focus on mitigating threats with the capacity to impact availability and hinder overall reliability. For example, successful Denial-Of-Service (DDoS) attacks are known to have a catastrophic impact on availability and performance. How an application mitigates those attack vectors, such as SlowLoris will impact the overall reliability. So, the application must be fully protected against threats intended to directly or indirectly compromise application reliability to be truly mission critical in nature.
It's also important to note that there are often significant trade-offs associated with a hardened security posture, particularly with respect to performance, operational agility, and in some cases reliability. For example, the inclusion of inline Network Virtual Appliances (NVA) for Next-Generation Firewall (NGFW) capabilities, such as deep packet inspection, will introduce a significant performance penalty, additional operational complexity, and a reliability risk if scalability and recovery operations are not closely aligned with that of the application. It's therefore essential that additional security components and practices intended to mitigate key threat vectors are also designed to support the reliability target of an application, which will form a key aspect of the recommendations and considerations presented within this section.
Important
This article is part of the Azure Well-Architected mission-critical workload series. If you aren't familiar with this series, we recommend you start with what is a mission-critical workload?
Mission-Critical open source project
The reference implementations are part of an open source project available on GitHub. The code assets adopt a Zero Trust model to structure and guide the security design and implementation approach.
The Microsoft Zero Trust model provides a proactive and integrated approach to applying security across all layers of an application. The guiding principles of Zero Trust strives to explicitly and continuously verify every transaction, assert least privilege, use intelligence, and advanced detection to respond to threats in near real-time. It's ultimately centered on eliminating trust inside and outside of application perimeters, enforcing verification for anything attempting to connect to the system.
As you assess the security posture of the application, start with these questions as the basis for each consideration.
Continuous security testing to validate mitigations for key security vulnerabilities.
Security level across all lower-environments.
Authentication and Authorization continuity in the event of a failure.
Automated security compliance and remediation.
Secret scanning to detect secrets before code is committed to prevent any secret leaks through source code repositories.
Secure the software supply chain.
Data protection key lifecycles.
CI/CD tooling should require Microsoft Entra service principals with sufficient subscription level access to facilitate control plane access for Azure resource deployments to all considered environment subscriptions.
Use Azure Policy to enforce security and reliability configurations for all service, ensuring that any deviation is either remediated or prohibited by the control plane at configuration-time, helping to mitigate threats associated with 'malicious admin' scenarios.
Use Microsoft Entra Privileged Identity Management (PIM) within production subscriptions to revoke sustained control plane access to production environments. This will significantly reduce the risk posed from 'malicious admin' scenarios through additional 'checks and balances'.
Use Azure Managed Identities for all services that support the capability, since it facilitates the removal of credentials from application code and removes the operational burden of identity management for service to service communication.
Use Microsoft Entra role-based access control (RBAC) for data plane authorization with all services that support the capability.
Use first-party Microsoft identity platform authentication libraries within application code to integrate with Microsoft Entra ID.
Consider secure token caching to allow for a degraded but available experience if the chosen identity platform, isn't available or is only partially available for application authorization.
Use Infrastructure-as-Code (IaC) and automated CI/CD pipelines to drive updates to all application components, including under failure circumstances.
Define an appropriate security posture for all lower environments to ensure key vulnerabilities are mitigated.
Enable Microsoft Defender for Cloud (formerly known as Azure Security Center) for all subscriptions that contain the resources for a mission-critical workload.
Embrace DevSecOps and implement security testing within CI/CD pipelines.
Enable secret scanning and dependency scanning within the source code repository.
Threat modeling provides a risk based approach to security design, using identified potential threats to develop appropriate security mitigations. There are many possible threats with varying probabilities of occurrence, and in many cases threats can chain in unexpected, unpredictable, and even chaotic ways. This complexity and uncertainty is why traditional technology requirement based security approaches are largely unsuitable for mission-critical cloud applications. Expect the process of threat modeling for a mission-critical application to be complex and unyielding.
To help navigate these challenges, a layered defense-in-depth approach should be applied to define and implement compensating mitigations for modeled threats, considering the following defensive layers.
Note
When deploying within an Azure landing zone, be aware that an additional threat mitigation layer through the provisioning of centralized security capabilities is provided by the landing zone implementation.
STRIDE provides a lightweight risk framework for evaluating security threats across key threat vectors.
Allocate engineering budget within each sprint to evaluate potential new threats and implement mitigations.
Conscious effort should be applied to ensure security mitigations are captured within a common engineering criteria to drive consistency across all application service teams.
Start with a service by service level threat modeling and unify the model by consolidating the thread model on application level.
Preventing unauthorized access to a mission-critical application and encompassed data is vital to maintain availability and safeguard data integrity.
Zero Trust assumes a breached state and verifies each request as though it originates from an uncontrolled network.
Azure PaaS services are typically accessed over public endpoints. Azure provides capabilities to secure public endpoints or even make them entirely private.
For supported services, Azure Private Link using Azure Private Endpoints addresses data exfiltration risks associated with Service Endpoints, such as a malicious admin writing data to an external resource.
When restricting network access to Azure PaaS services using Private Endpoints or Service Endpoints, a secure network channel will be required for deployment pipelines to access both the Azure control plane and data plane of Azure resources in order to deploy and manage the application.
The completion of administration and maintenance tasks is a further scenario requiring connectivity to the data plane of Azure resources.
Service Connections with a corresponding Microsoft Entra service principal can be used within Azure DevOps to apply RBAC through Microsoft Entra ID.
Service Tags can be applied to Network Security Groups to facilitate connectivity with Azure PaaS services.
Application Security Groups don't span across multiple virtual networks.
Packet capture in Azure Network Watcher is limited to a maximum period of five hours.
Limit public network access to the absolute minimum required for the application to fulfill its business purpose to reduce the external attack surface.
When dealing with private build agents, never open an RDP or SSH port directly to the internet.
Use a DDoS standard protection plan to secure all public IP addresses within the application.
Use Azure Front Door with web application firewall policies to deliver and help protect global HTTP/S applications that span multiple Azure regions.
If additional in-line network security requirements, such as deep packet inspection or TLS inspection, mandate the use of Azure Firewall Premium or Network Virtual Appliance (NVA), ensure it's configured for maximum high availability and redundancy.
If packet capture requirements exist, use Network Watcher packets to capture despite the limited capture window.
Use Network Security Groups and Application Security Groups to micro-segment application traffic.
Enable NSG flow logs and feed them into Traffic Analytics to gain insights into internal and external traffic flows.
Use Azure Private Link/Private Endpoints, where available, to secure access to Azure PaaS services within the application design. For information on Azure services that support Private Link, see Azure Private Link availability.
If Private Endpoint isn't available and data exfiltration risks are acceptable, use Virtual Network Service Endpoints to secure access to Azure PaaS services from within a virtual network.
For hybrid application scenarios, access Azure PaaS services from on-premises via ExpressRoute with private peering.
Note
When deploying within an Azure landing zone, be aware that network connectivity to on-premises data centers is provided by the landing zone implementation. One approach is by using ExpressRoute configured with private peering.
Encryption is a vital step toward ensuring data integrity and is ultimately one of the most important security capabilities that can be applied to mitigate a wide array of threats. This section will therefore provide key considerations and recommendations related to encryption and key management in order to safeguard data without compromising application reliability.
Azure Key Vault has transaction limits for keys and secrets, with throttling applied per vault within a certain period.
Azure Key Vault provides a security boundary since access permissions for keys, secrets, and certificates are applied at a vault level.
After a role assignment is changed, there's a latency of up to 10 minutes (600 seconds) for the role to be applied.
Azure Key Vault underlying hardware security modules (HSMs) have FIPS 140 validation.
Azure Key Vault provides high availability and redundancy to help maintain availability and prevent data loss.
During a region failover, it may take a few minutes for the Key Vault service to fail over.
If private link is used to connect to Azure Key Vault, it may take up to 20 minutes for the connection to be re-established during a regional failover.
A backup creates a point-in-time snapshot of a secret, key, or certificate, as an encrypted blob that can't be decrypted outside of Azure. To get usable data from the blob, it must be restored into a Key Vault within the same Azure subscription and Azure geography.
With service-managed keys, Azure will perform key management functions, such as rotation, thereby reducing the scope of application operations.
Regulatory controls may stipulate the use of customer-managed keys for service encryption functionality.
When traffic moves between Azure data centers, MACsec data-link layer encryption is used on the underlying network hardware to secure data in-transit outside of the physical boundaries not controlled by Microsoft or on behalf of Microsoft.
Use service-managed keys for data protection where possible, removing the need to manage encryption keys and handle operational tasks such as key rotation.
Use Azure Key Vault as a secure repository for all secrets, certificates, and keys if additional encryption mechanisms or customer-managed keys need considered.
Deploy a separate Azure Key Vault instance within each regional deployment stamp, providing fault isolation and performance benefits through localization, as well as navigating the scale limits imposed by a single Key Vault instance.
Follow a least privilege model by limiting authorization to permanently delete secrets, keys, and certificates to specialized custom Microsoft Entra roles.
Ensure encryption keys, and certificates stored within Key Vault are backed up, providing an offline copy in the unlikely event Key Vault becomes unavailable.
Use Key Vault certificates to manage certificate procurement and signing.
Establish an automated process for key and certificate rotation.
Monitor key, certificate, and secret usage.
Security conventions are ultimately only effective if consistently and holistically enforced across all application services and teams. Azure Policy provides a framework to enforce security and reliability baselines, ensuring continued compliance with a common engineering criteria for a mission-critical application. More specifically, Azure Policy forms a key part of the Azure Resource Manager (ARM) control plane, supplementing RBAC by restricting what actions authorized users can perform, and can be used to enforce vital security and reliability conventions across utilized platform services.
This section will therefore explore key considerations and recommendations surrounding the use of Azure Policy driven governance for a mission-critical application, ensuring security and reliability conventions are continuously enforced.
Note
When deploying within an Azure landing zone, be aware that the enforcement of centralized baseline policy assignments will likely be applied in the implementation for landing zone management groups and subscriptions.
Azure Policy can be used to drive automated management activities, such as provisioning and configuration.
Azure Policy assignment scope dictates coverage and the location of Azure Policy definitions informs the reusability of custom policies.
Azure Policy has several limits, such as the number of definitions at any particular scope.
It can take several minutes for the execution of Deploy If Not Exist (DINE) policies to occur.
Azure Policy provides a critical input for compliance reporting and security auditing.
Map regulatory and compliance requirements to Azure Policy definitions.
Define a common engineering criteria to capture secure and reliable configuration definitions for all utilized Azure services, ensuring this criteria is mapped to Azure Policy assignments to enforce compliance.
The Mission Critical reference implementation contain a wide array of security and reliability centric policies to define and enforce a sample common engineering criteria.
For mission-critical scenarios with multiple production subscriptions under a dedicated management group, prioritize assignments at the management group scope.
Use built-in policies where possible to minimize operational overhead of maintaining custom policy definitions.
Where custom policy definitions are required, ensure definitions are deployed at suitable management group scope to allow for reuse across encompassed environment subscriptions to this allow for policy reuse across production and lower environments.
Note
When deploying within an Azure landing zone, consider deploying custom Azure Policy Definitions within the intermediate company root management group scope to enable reuse across all applications within the broader Azure estate. In a landing zone environment, certain centralized security policies will be applied by default within higher management group scopes to enforce security compliance across the entire Azure estate. For example, Azure policies should be applied to automatically deploy software configurations through VM extensions and enforce a compliant baseline VM configuration.
If the application is subscribed to Microsoft Mission-Critical Support, ensure that the applied tagging schema provides meaningful context to enrich the support experience with deep application understanding.
In an Azure landing zone, Microsoft Entra activity logs will also be ingested into the centralized platform Log Analytics workspace. It needs to be evaluated in this case if Microsoft Entra ID are still required in the global Log Analytics workspace.
In scenarios where the use of IaaS Virtual Machines is required, some specifics have to be taken into consideration.
Follow and apply security practices for mission-critical application scenarios as described above, when applicable, as well as the Security best practices for IaaS workloads in Azure.
Review the best practices for operational procedures for mission-critical application scenarios.
Events
May 19, 6 PM - May 23, 12 AM
Calling all developers, creators, and AI innovators to join us in Seattle @Microsoft Build May 19-22.
Register today