How do I maintain SCOM DB health?

Pauciloquent 71 Reputation points
2023-05-16T06:48:57.1233333+00:00

Hi

This post will be bit lengthy but I bet it will help many SCOM admins.

I am running SCOM 2019 with SQL 2019 in a clustered environment. Both OpsMgr and DW DBs are running in different clusters. We are monitoring around 6-7k agents; only Windows servers and applications.

I am writing a weekly, monthly, bi-annual and annual health check plan for SCOM DBs.

So far, I wrote the following plan:

  1. Daily Tasks:
    1. Monitor the Event IDs that can be troublesome in future like 4706 and 2115.
    2. Monitor the disk space
    3. Monitor SQL Server Performance: CPU usage, memory usage, and the number of user connections. Is there any default monitor or rule for the number of user connections?
    4. Check Database Size: Monitor the size of your OperationsManager and OperationsManagerDW databases to ensure they are not growing too quickly.
  2. Weekly Tasks:
    1. Index Maintenance: Rebuild or reorganize indexes based on their fragmentation level. This can help improve the performance of your databases.
  3. Monthly Tasks:
    1. Check Database Integrity: Use the DBCC CHECKDB command to check the logical and physical integrity of your databases.
    2. Backup Database and Transaction Logs: Regular backups are crucial for disaster recovery.
  4. Bi-Annually Tasks:
    1. Review Security Settings: Review and tighten security settings to protect your databases from unauthorized access.
    2. Review Database and Log File Sizes: If your databases or log files are too large, consider archiving old data or increasing the size of your storage.
  5. Annually Tasks:
    1. Review Database Design: Review the design of your databases to ensure they are still meeting your needs. Consider whether any changes or improvements could be made.
    Review Backup and Restore Procedures: Make sure your backup and restore procedures are still appropriate and that they are working correctly.
    1. Perform a Full DR Test: This will ensure that you can recover your databases in the event of a disaster.

My questions are:

  1. Regarding Point 1.a. are there any other event ID that I need to monitor?
  2. Are there any default monitors or rules that I should enable for SQL servers?
  3. Please recommend me any powershell scripts or SQL queries that I can use to build a couple of reports for daily check up? Please point me to the blog.

Please advise as necessary.

System Center Operations Manager
System Center Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,607 questions
SQL Server Reporting Services
SQL Server Reporting Services
A SQL Server technology that supports the creation, management, and delivery of both traditional, paper-oriented reports and interactive, web-based reports.
3,063 questions
SQL Server | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. XinGuo-MSFT 22,231 Reputation points
    2023-05-17T08:59:36.42+00:00

    Hi,

    Thanks for your sharing. There are some great tips for your reference.

    Troubleshoot event ID 2115-related performance problems in Operations Manager

    SCOM SQL queries

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.