Microsoft Incident Response team ransomware approach and best practices
Human-operated ransomware isn't a malicious software problem - it's a human criminal problem. The solutions used to address commodity problems aren't enough to prevent a threat that more closely resembles a nation-state threat actor who:
- Disables or uninstalls your antivirus software before encrypting files
- Disables security services and logging to avoid detection
- Locates and corrupts or deletes backups before sending a ransom demand
These actions are commonly done with legitimate programs - such as Quick Assist in May 2024 - that you might already have in your environment for administrative purposes. In criminal hands, these tools are used maliciously to conduct attacks.
Responding to the increasing threat of ransomware requires a combination of modern enterprise configuration, up-to-date security products, and the vigilance of trained security staff to detect and respond to the threats before data is lost.
The Microsoft Incident Response team (formerly DART/CRSP) responds to security compromises to help customers become cyber-resilient. Microsoft Incident Response provides onsite reactive incident response and remote proactive investigations. Microsoft Incident Response uses Microsoft's strategic partnerships with security organizations around the world and internal Microsoft product groups to provide the most complete and thorough investigation possible.
This article describes how Microsoft Incident Response handles ransomware attacks for Microsoft customers so that you can consider applying elements of their approach and best practices for your own security operations playbook.
Note
This article content was derived from the A guide to combatting human-operated ransomware: Part 1 and A guide to combatting human-operated ransomware: Part 2 Microsoft Security team blog posts.
How Microsoft Incident Response uses Microsoft security services
Microsoft Incident Response relies heavily on data for all investigations and uses existing deployments of Microsoft security services such as Microsoft Defender for Office 365, Microsoft Defender for Endpoint, Microsoft Defender for Identity, and Microsoft Defender for Cloud Apps.
Microsoft Defender for Endpoint
Defender for Endpoint is Microsoft's enterprise endpoint security platform designed to help enterprise network security analysts prevent, detect, investigate, and respond to advanced threats. Defender for Endpoint can detect attacks using advanced behavioral analytics and machine learning. Your analysts can use Defender for Endpoint for attacker behavioral analytics.
Your analysts can also perform advanced hunting queries to pivot off indicators of compromise (IOCs) or search for known behavior if they identify a threat actor group.
In Defender for Endpoint, you have access to a real-time expert-level monitoring and analysis service by Microsoft Threat Experts for ongoing suspected actor activity. You can also collaborate with experts on demand for more insights into alerts and incidents.
Microsoft Defender for Identity
You use Defender for Identity to investigate known compromised accounts and to find potentially compromised accounts in your organization. Defender for Identity sends alerts for known malicious activity that actors often use such as DCSync attacks, remote code execution attempts, and pass-the-hash attacks. Defender for Identity enables you to pinpoint suspect activity and accounts to narrow down the investigation.
Microsoft Defender for Cloud Apps
Defender for Cloud Apps (previously known as Microsoft Cloud App Security) allows your analysts to detect unusual behavior across cloud apps to identify ransomware, compromised users, or rogue applications. Defender for Cloud Apps is Microsoft's cloud access security broker (CASB) solution that allows for monitoring of cloud services and data access in cloud services by users.
Microsoft Secure Score
The set of Microsoft Defender XDR services provides live remediation recommendations to reduce the attack surface. Microsoft Secure Score is a measurement of an organization's security posture, with a higher number indicating that more improvement actions have been taken. See the Secure Score documentation to find out more about how your organization can use this feature to prioritize remediation actions that are based on their environment.
The Microsoft Incident Response approach to conducting ransomware incident investigations
You should make every effort to determine how the adversary gained access to your assets so that vulnerabilities can be remediated. Otherwise, it is highly likely that the same type of attack happens again in the future. In some cases, the threat actor takes steps to cover their tracks and destroy evidence, so it is possible that the entire chain of events might not be evident.
The following are three key steps in Microsoft Incident Response ransomware investigations:
Step | Goal | Initial questions |
---|---|---|
1. Assess the current situation | Understand the scope | What initially made you aware of a ransomware attack? What time/date did you first learn of the incident? What logs are available and is there any indication that the actor is currently accessing systems? |
2. Identify the affected line-of-business (LOB) apps | Get systems back online | Does the application require an identity? Are backups of the application, configuration, and data available? Are the content and integrity of backups regularly verified using a restore exercise? |
3. Determine the compromise recovery (CR) process | Remove attacker control from the environment | N/A |
Step 1: Assess the current situation
An assessment of the current situation is critical to understanding the scope of the incident and for determining the best people to assist and to plan and scope the investigation and remediation tasks. Asking the following initial questions is crucial in helping to determine the situation.
What initially made you aware of the ransomware attack?
If your IT staff identified the initial threat–such as noticing backups being deleted, antivirus alerts, endpoint detection and response (EDR) alerts, or suspicious system changes–it is often possible to take quick decisive measures to thwart the attack, typically by disabling all inbound and outbound Internet communication. This threat might temporarily affect business operations, but that would typically be much less impactful than an adversary deploying ransomware.
If a user call to the IT helpdesk identified the threat, there might be enough advance warning to take defensive measures to prevent or minimize the effects of the attack. If an external entity such as law enforcement or a financial institution identified the threat, it's likely that the damage is already done, and you'll see evidence in your environment that the threat actor has administrative control of your network. This evidence can range from ransomware notes, locked screens, or ransom demands.
What date/time did you first learn of the incident?
Establishing the initial activity date and time is important because it helps narrow the scope of the initial triage for quick wins by the attacker. Additional questions might include:
- What updates were missing on that date? It's important to understand what vulnerabilities might have been exploited by the adversary.
- What accounts were used on that date?
- What new accounts have been created since that date?
What logs are available, and is there any indication that the actor is currently accessing systems?
Logs - such as antivirus, EDR, and virtual private network (VPN)-are an indicator of suspected compromise. Follow-up questions might include:
- Are logs being aggregated in a Security Information and Event Management (SIEM) solution-such as Microsoft Sentinel, Splunk, ArcSight, and others-and current? What is the retention period of this data?
- Are there any suspected compromised systems that are experiencing unusual activity?
- Are there any suspected compromised accounts that appear to be actively used by the adversary?
- Is there any evidence of active command and controls (C2s) in EDR, firewall, VPN, web proxy, and other logs?
As part of assessing the current situation, you might need an Active Directory Domain Services (AD DS) domain controller that was not compromised, a recent backup of a domain controller, or a recent domain controller taken offline for maintenance or upgrades. Also determine whether multifactor authentication (MFA) was required for everyone in the company and if Microsoft Entra ID was used.
Step 2: Identify the LOB apps that are unavailable due to the incident
This step is critical in figuring out the quickest way to get systems back online while obtaining the evidence required.
Does the application require an identity?
- How is authentication performed?
- How are credentials such as certificates or secrets stored and managed?
Are tested backups of the application, configuration, and data available?
- Are the contents and integrity of backups regularly verified using a restore exercise? This check is particularly important after configuration management changes or version upgrades.
Step 3: Determine the compromise recovery process
This step might be necessary if you've determined that the control plane, which is typically AD DS, has been compromised.
Your investigation should always have a goal of providing output that feeds directly into the CR process. CR is the process that removes attacker control from an environment and tactically increase security posture within a set period. CR takes place post-security breach. To learn more about CR, read the Microsoft Compromise Recovery Security Practice team's CRSP: The emergency team fighting cyber attacks beside customers blog article.
After gathering the responses to the questions in steps 1 and 2, you can build a list of tasks and assign owners. A key factor in a successful incident response engagement is thorough, detailed documentation of each work item (such as the owner, status, findings, date, and time), making the compilation of findings at the end of the engagement a straightforward process.
Microsoft Incident Response recommendations and best practices
Here are Microsoft Incident Response's recommendations and best practices for containment and post-incident activities.
Containment
Containment can only happen once you determine what needs to be contained. In the case of ransomware, the adversary's goal is to obtain credentials that allow administrative control over a highly available server and then deploy the ransomware. In some cases, the threat actor identifies sensitive data and exfiltrates it to a location they control.
Tactical recovery is unique for your organization's environment, industry, and level of IT expertise and experience. The steps outlined below are recommended for short-term and tactical containment steps your organization can take. To learn more about for long-term guidance, see securing privileged access. For a comprehensive view of ransomware and extortion and how to prepare and protect your organization, see Human-operated ransomware.
The following containment steps can be done concurrently as new threat vectors are discovered.
Step 1: Assess the scope of the situation
- Which user accounts were compromised?
- Which devices are affected?
- Which applications are affected?
Step 2: Preserve existing systems
- Disable all privileged user accounts except for a small number of accounts used by your admins to assist in resetting the integrity of your AD DS infrastructure. If you believe a user account is compromised, disable it immediately.
- Isolate compromised systems from the network, but don't shut them off.
- Isolate at least one known good domain controller in every domain-two is even better. Either disconnect them from the network or shut them down entirely. The object is to stop the spread of ransomware to critical systems-identity being among the most vulnerable. If all your domain controllers are virtual, ensure that the virtualization platform's system and data drives are backed up to offline external media that isn't connected to the network, in case the virtualization platform itself is compromised.
- Isolate critical known good application servers, for example SAP, configuration management database (CMDB), billing, and accounting systems.
These two steps can be done concurrently as new threat vectors are discovered. Disable those threat vectors and then try to find a known good system to isolate from the network.
Other tactical containment actions can include:
Reset the krbtgt password, twice in rapid succession. Consider using a scripted, repeatable process. This script enables you to reset the krbtgt account password and related keys while minimizing the likelihood of Kerberos authentication issues being caused by the operation. To minimize potential issues, the krbtgt lifetime can be reduced one or more times prior to the first password reset so that the two resets are done quickly. Note that all domain controllers that you plan to keep in your environment must be online.
Deploy a Group Policy to the entire domain(s) that prevents privileged login (Domain Admins) to anything but domain controllers and privileged administrative-only workstations (if any).
Install all missing security updates for operating systems and applications. Every missing update is a potential threat vector that adversaries can quickly identify and exploit. Microsoft Defender for Endpoint's Threat and Vulnerability Management provides an easy way to see exactly what is missing-as well as the potential impact of the missing updates.
For Windows 10 (or higher) devices, confirm that the current version (or n-1) is running on every device.
Deploy attack surface reduction (ASR) rules to prevent malware infection.
Enable all Windows 10 security features.
Check that every external facing application, including VPN access, is protected by multifactor authentication, preferably using an authentication application that is running on a secured device.
For devices not using Defender for Endpoint as their primary antivirus software, run a full scan with Microsoft Safety Scanner on isolated known good systems before reconnecting them to the network.
For any legacy operating systems, upgrade to a supported OS or decommission these devices. If these options are not available, take every possible measure to isolate these devices, including network/VLAN isolation, Internet Protocol security (IPsec) rules, and sign-in restrictions, so they are only accessible to the applications by the users/devices to provide business continuity.
The riskiest configurations consist of running mission critical systems on legacy operating systems as old as Windows NT 4.0 and applications, all on legacy hardware. Not only are these operating systems and applications insecure and vulnerable, if that hardware fails, backups typically can't be restored on modern hardware. Unless replacement legacy hardware is available, these applications cease to function. Strongly consider converting these applications to run on current operating systems and hardware.
Post-incident activities
Microsoft Incident Response recommends implementing the following security recommendations and best practices after each incident.
Ensure that best practices are in place for email and collaboration solutions to make it more difficult for attackers to abuse them while allowing internal users to access external content easily and safely.
Follow Zero Trust security best practices for remote access solutions to internal organizational resources.
Starting with critical impact administrators, follow best practices for account security including using passwordless authentication or MFA.
Implement a comprehensive strategy to reduce the risk of privileged access compromise.
For cloud and forest/domain administrative access, use Microsoft's privileged access model (PAM).
For endpoint administrative management, use the local administrative password solution (LAPS).
Implement data protection to block ransomware techniques and to confirm rapid and reliable recovery from an attack.
Review your critical systems. Check for protection and backups against deliberate attacker erasure or encryption. It's important that you periodically test and validate these backups.
Ensure rapid detection and remediation of common attacks on endpoint, email, and identity.
Actively discover and continuously improve the security posture of your environment.
Update organizational processes to manage major ransomware events and streamline outsourcing to avoid friction.
PAM
Using the PAM (formerly known as the tiered administration model) enhances Microsoft Entra ID's security posture, which involves:
Breaking out administrative accounts in a "planed" environment-one account for each level, usually four:
Control Plane (formerly Tier 0): Administration of domain controllers and other crucial identity services, such as Active Directory Federation Services (ADFS) or Microsoft Entra Connect, which also includes server applications that require administrative permissions to AD DS, such as Exchange Server.
The next two planes were formerly Tier 1:
Management Plane: Asset management, monitoring, and security.
Data/Workload Plane: Applications and application servers.
The next two planes were formerly Tier 2:
User Access: Access rights for users (such as accounts).
App Access: Access rights for applications.
Each one of these planes has a separate administrative workstation for each plane and only has access to systems in that plane. Other accounts from other planes are denied access to workstations and servers in the other planes through user rights assignments set to those machines.
The net result of the PAM is that:
A compromised user account only has access to the plane to which it belongs.
More sensitive user accounts won't be logging into workstations and servers with a lower plane's security level, thereby reducing lateral movement.
LAPS
By default, Microsoft Windows and AD DS have no centralized management of local administrative accounts on workstations and member servers. This can result in a common password that is given for all these local accounts, or at the very least in groups of machines. This situation enables would-be attackers to compromise one local administrator account, and then use that account to gain access to other workstations or servers in the organization.
Microsoft's LAPS mitigates this by using a Group Policy client-side extension that changes the local administrative password at regular intervals on workstations and servers according to the policy set. Each of these passwords is different and stored as an attribute in the AD DS computer object. This attribute can be retrieved from a simple client application, depending on the permissions assigned to that attribute.
LAPS requires the AD DS schema to be extended to allow for the additional attribute, the LAPS Group Policy templates to be installed, and a small client-side extension to be installed on every workstation and member server to provide the client-side functionality.
You can get LAPS from the Microsoft Download Center.
Additional ransomware resources
Key information from Microsoft:
2023 Microsoft Digital Defense Report (see pages 17-26)
Ransomware: A pervasive and ongoing threat Threat analytics report in the Microsoft Defender portal
Microsoft 365:
- Deploy ransomware protection for your Microsoft 365 tenant
- Maximize Ransomware Resiliency with Azure and Microsoft 365
- Recover from a ransomware attack
- Malware and ransomware protection
- Protect your Windows 10 PC from ransomware
- Handling ransomware in SharePoint Online
- Threat analytics reports for ransomware in the Microsoft Defender portal
Microsoft Defender XDR:
Microsoft Azure:
- Azure Defenses for Ransomware Attack
- Maximize Ransomware Resiliency with Azure and Microsoft 365
- Backup and restore plan to protect against ransomware
- Help protect from ransomware with Microsoft Azure Backup (26-minute video)
- Recovering from systemic identity compromise
- Advanced multistage attack detection in Microsoft Sentinel
- Fusion Detection for Ransomware in Microsoft Sentinel
Microsoft Defender for Cloud Apps: