SCOM HealthService service stopped on all management servers at 2:00 AM

Jesus Chao 141 Reputation points
2022-12-06T15:33:28.19+00:00

Hi,

We have two separate SCOM environments (one for test/dev servers and one for production) that experienced a strange issue over the past weekend. All the management servers for both environments had the healthservice stop at 2:00 AM. There are 4 management servers in Production and 2 in Test/Dev. The environments only share a domain, network, and virtualization infrastructure. Everything else is separate (different databases, management group, etc).

Does anyone know of any processes (clean-up, etc) that runs at 2:00 AM in the morning that may have caused the health service to completely stop? This event in the operations manager log is recorded just before the service stops.

Provider

[ Name] Health Service ESE Store

  • EventID 327 [ Qualifiers] 0 Level 4 Task 1 Keywords 0x80000000000000
  • TimeCreated [ SystemTime] 2022-12-04T07:00:03.911475400Z EventRecordID 185075 Channel Operations Manager

HealthService (2604,D,51) Health Service Store: The database engine detached a database (1, C:\Program Files\Microsoft System Center\Operations Manager\Server\Health Service State\Health Service Store\HealthServiceStore.edb). (Time=0 seconds)

Revived Cache: 0 0
Additional Data: lgposDetach = 00006B03:000F:0000

Internal Timing Sequence:
[1] 0.000012 +J(0)
[2] 0.000001 +J(0)
[3] 0.000107 +J(0)
[4] 0.000001 +J(0)
[5] 0.0 +J(0)
[6] 0.009554 -0.005641 (3) WT +J(0) +M(C:-84K, Fs:10, WS:-68K # 0K, PF:-64K # 0K, P:-64K)
[7] 0.001785 +J(0)
[8] 0.002098 -0.000857 (1) WT +J(CM:0, PgRf:0, Rd:0/0, Dy:0/0, Lg:4096/2) +M(C:0K, Fs:1, WS:4K # 0K, PF:0K # 0K, P:0K)
[9] 0.010725 -0.005817 (6) WT +J(0) +M(C:0K, Fs:3, WS:-60K # 0K, PF:-68K # 0K, P:-68K)
[10] 0.001253 +J(0)
[11] 0.000387 +J(0) +M(C:0K, Fs:2, WS:0K # 0K, PF:52K # 0K, P:52K).

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,436 questions
{count} vote

2 answers

Sort by: Most helpful
  1. SChalakov 10,371 Reputation points MVP
    2022-12-08T09:41:43.21+00:00

    Hi Jesus (@Jesus Chao ),

    can you please post some more info about the SCOM version and UR level in both environemnts?
    Does this happen on regular basis or it just happened once?
    When did you install those MGs?
    Do you have MPs (example VMWare), which are installed in both environemnts and could cause that?

    Reference:
    Unstable Behavior from Ops Mgr Health Service¨
    https://helpcenter.veeam.com/docs/mp/vmware_guide/unstable_health_service_behavior.html?ver=90

    ----------

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)
    Regards
    Stoyan Chalakov


  2. Dwayne 1 Reputation point
    2023-10-26T02:25:41.6366667+00:00

    in general any DB maintenance scheduled should not stop the agents unless there are connectivity issues caused by it (opsmgr logs would show sdk issues) all agents in a scom environment should not fail at once unless there is a prolonged outage to DB, well from my experience in various sized systems up to 30 management servers over the last 12 years...

    if that is the solution Kevin Holman has a page for that https://kevinholman.com/2017/03/08/recommended-registry-tweaks-for-scom-2016-management-servers/ under DAL