question

AmanBansal-8575 avatar image
0 Votes"
AmanBansal-8575 asked AmanBansal-8575 commented

SCOM DR and Agent Cache

As per Microsoft documentation for SCOM 2019 DR - The SCOM MS can be deployed to secondary DR on Standby recovery, by removing them from resource pool and disabling MMA, Data Access and configuration service.

Question : So, basically we need to fist install Management Servers on DR site as part of same Management Group and then remove them from resource pool and disable service. Is this correct understanding ? Will this not generate errors while group calculation as MS in DR are down ?

Also, as per documents

"Reconfigure the Windows agents to cache only management servers in your primary data center that should manage them to prevent them from attempting to failover to a management server in the secondary data center, which would only delay recovery and reporting. This can be accomplished if you manually deploy the agent in an automated manner with a script (for example, VBScript or better yet, PowerShell) to pre-configure during installation, or post deployment if you push the agent from the console, again using a scripted method managed with your enterprise configuration management solution.

Question : How we can achieve this and configure agents for not to contact MS in DR and How to enable them to connect to DR when situation arrives or testing DR setup

msc-operations-manager
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

StoyanChalakov avatar image
1 Vote"
StoyanChalakov answered AmanBansal-8575 commented

Hi @AmanBansal-8575,

here are the answers to your questions, I will start with the second one. I already answered this in a post here:

SCOM Deployment DR
https://docs.microsoft.com/en-us/answers/questions/93844/scom-deployment-dr.html

Usually if you have a DR site or a second site, it is recommended to prevent your agents from failing over to it. Usually this has lots of reasons, like firewall ports and connectivity in general. You should also remove your DR management server from any resource pools.
All this is clearly described here:

High Availability and Disaster Recovery
https://docs.microsoft.com/en-us/system-center/scom/plan-hadr-design?view=sc-om-2019

If one site goes offline, the agent will fail over to the management server in another site, assuming that the agent’s failover configuration allows this. Reconfigure the Windows agents to cache only management servers in your primary data center that should manage them to prevent them from attempting to failover to a management server in the secondary data center, which would only delay recovery and reporting.


What this means is that you need to use Powershell in order to configure the Primary and Failover Management Servers for your agents (done centrally on your management server using Powershell) and those Primary and Secondary Management Servers should be different from your DR Management Server.

Assigning Gateways and Agents to Management Servers using PowerShell
https://kevinholman.com/2018/08/06/assigning-gateways-and-agents-to-management-servers-using-powershell/

Now to your second question:

"Question : So, basically we need to fist install Management Servers on DR site as part of same Management Group and then remove them from resource pool and disable service. Is this correct understanding ? Will this not generate errors while group calculation as MS in DR are down ?"

Yes, you are absolutely correct and this is also officially stated in the MS Docs guide:

High Availability and Disaster Recovery
https://docs.microsoft.com/en-us/system-center/scom/plan-hadr-design?view=sc-om-2019

This will not generate errors, since the management servers is not part of ANY resource pool and does not participate in group calc, Agent handling, notifications or whatever other workflows are initialized on the different members of the different resource pools.

Last, but not least, please always consider backing up your SCOM databases...This is where you have your data and as long as those are backed up and you can restore any of the management servers, you are safe.

Hope I could help you out with that!


(If the reply was helpful please don't forget to upvote or accept as answer, thank you)
Regards,
Stoyan




· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I hope for Unix Resource Pool - The Management Servers should be added to Unix Resource Pool with certificates exchange. Afterwards removed from Unix Resource Pool and services to down state.

Another query - What is the best practice for URL monitor, a dedicated resource pool or few individual agents. At current setup we have individual agents but we are getting lots for fake alerts. Mostly for response time which is even not configured for alerts. The only condition is response code > 400, but still lots of alerts appears.

For Example : Alert description: http://xxx.xxxx.com has a problem. Please see the alert context tab for details of the failure. The Transaction Response Time was 21.03199647179 seconds.

0 Votes 0 ·