Reliability in Microsoft Defender for Cloud DevOps security

This article describes reliability support in Microsoft Defender for Cloud DevOps security features, which includes cross-region recovery and business continuity. For a more detailed overview of reliability in Azure, see Azure reliability.

This article is specific to recover in the case of a region outage. If you are looking to move your existing DevOps connector to a new region, please see Common questions about Defender for DevOps

Cross-region disaster recovery and business continuity

Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.

When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.

Microsoft Defender for Cloud DevOps security supports single-region disaster recovery. As such, a multi-region disaster recovery process simply implements the single-region disaster recovery process outlined in this document.

Supported regions

For regions that support DevOps security in Defender for Cloud, see DevOps security region support.

Single-region disaster recovery process

The single region disaster recovery process for DevOps security features is based on the Shared Responsibility model, and so includes both customer and Microsoft procedures.

Customer responsibility

When a region goes down, your configurations for the connector of that region is lost. Lost configurations include customer tokens, auto discovery configurations, and ADO annotations configurations.

To request recovery of a connector created in a downed region:

  1. Create a new connector in a new region. See onboarding documentation for Azure DevOps, GitHub, and/or GitLab.

    Note

    You can use an existing connector in the new region, as long as it's authenticated to have access to the scope of DevOps resources in the old connector.

  2. Open a new support request to release ownership of the DevOps resources from the old connector.

    1. In Azure portal, navigate to Help + Support
    2. Fill out the form:
      1. Issue type: Technical
      2. Service type: Microsoft Defender for Cloud
      3. Summary: "Region outage - DevOps Connector recovery"
      4. Problem type: Defender CSPM plan
      5. Problem subtype: DevOps security
  3. Copy the Resource ID of the new and old DevOps connectors. This information is available in Azure Resource Graph. Resource ID format: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Security/securityConnectors/{connectorName}

    You can run the query below using Azure Resource Graph Explorer to find the Resource ID:

    resources
     | extend connectorType = tostring(parse_json(properties["environmentName"]))
     | where type == "microsoft.security/securityconnectors"
     | where connectorType in ("AzureDevOps", "Github", "GitLab")
     | project connectorResourceId = id, region = location
    
    
  4. Once the DevOps resources have been released from the old connector and appear for the new connector, reconfigure the pull request annotations as needed.

  5. The new connector will be made primary. When the region recovers from the outage, you can safely delete the old connector.

Microsoft responsibility

When a region goes down and you have established the new connector, Microsoft recreates all alerts, recommendations, and Cloud Security Graph entities from the old connector into the new connector.

Important

Microsoft doesn't recreate history for some functionalities, such as container mapping data from previous runs, alerts data more than one week old, and infrastructure as code (IaC) mapping history data.

Test your disaster recovery process

To test your disaster recovery process, you can simulate a lost connector by creating a second connector and following the support steps above.

Next steps

To learn more about the items discussed in this article, see: