SCOM DR and Agent Cache

Aman Bansal 46 Reputation points
2022-05-11T20:01:32.787+00:00

As per Microsoft documentation for SCOM 2019 DR - The SCOM MS can be deployed to secondary DR on Standby recovery, by removing them from resource pool and disabling MMA, Data Access and configuration service.

Question : So, basically we need to fist install Management Servers on DR site as part of same Management Group and then remove them from resource pool and disable service. Is this correct understanding ? Will this not generate errors while group calculation as MS in DR are down ?

Also, as per documents

"Reconfigure the Windows agents to cache only management servers in your primary data center that should manage them to prevent them from attempting to failover to a management server in the secondary data center, which would only delay recovery and reporting. This can be accomplished if you manually deploy the agent in an automated manner with a script (for example, VBScript or better yet, PowerShell) to pre-configure during installation, or post deployment if you push the agent from the console, again using a scripted method managed with your enterprise configuration management solution.

Question : How we can achieve this and configure agents for not to contact MS in DR and How to enable them to connect to DR when situation arrives or testing DR setup

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,413 questions
0 comments No comments
{count} votes

Accepted answer
  1. SChalakov 10,261 Reputation points MVP
    2022-05-12T06:23:10.347+00:00

    Hi @Aman Bansal ,

    here are the answers to your questions, I will start with the second one. I already answered this in a post here:

    SCOM Deployment DR
    https://learn.microsoft.com/en-us/answers/questions/93844/scom-deployment-dr.html

    Usually if you have a DR site or a second site, it is recommended to prevent your agents from failing over to it. Usually this has lots of reasons, like firewall ports and connectivity in general. You should also remove your DR management server from any resource pools.
    All this is clearly described here:

    High Availability and Disaster Recovery
    https://learn.microsoft.com/en-us/system-center/scom/plan-hadr-design?view=sc-om-2019

    If one site goes offline, the agent will fail over to the management server in another site, assuming that the agent’s failover configuration allows this. Reconfigure the Windows agents to cache only management servers in your primary data center that should manage them to prevent them from attempting to failover to a management server in the secondary data center, which would only delay recovery and reporting.

    What this means is that you need to use Powershell in order to configure the Primary and Failover Management Servers for your agents (done centrally on your management server using Powershell) and those Primary and Secondary Management Servers should be different from your DR Management Server.

    Assigning Gateways and Agents to Management Servers using PowerShell
    https://kevinholman.com/2018/08/06/assigning-gateways-and-agents-to-management-servers-using-powershell/

    Now to your second question:

    "Question : So, basically we need to fist install Management Servers on DR site as part of same Management Group and then remove them from resource pool and disable service. Is this correct understanding ? Will this not generate errors while group calculation as MS in DR are down ?"

    Yes, you are absolutely correct and this is also officially stated in the MS Docs guide:

    High Availability and Disaster Recovery
    https://learn.microsoft.com/en-us/system-center/scom/plan-hadr-design?view=sc-om-2019

    This will not generate errors, since the management servers is not part of ANY resource pool and does not participate in group calc, Agent handling, notifications or whatever other workflows are initialized on the different members of the different resource pools.

    Last, but not least, please always consider backing up your SCOM databases...This is where you have your data and as long as those are backed up and you can restore any of the management servers, you are safe.

    Hope I could help you out with that!

    ----------

    (If the reply was helpful please don't forget to upvote or accept as answer, thank you)
    Regards,
    Stoyan

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful