Hi
This post will be bit lengthy but I bet it will help many SCOM admins.
I am running SCOM 2019 with SQL 2019 in a clustered environment. Both OpsMgr and DW DBs are running in different clusters. We are monitoring around 6-7k agents; only Windows servers and applications.
I am writing a weekly, monthly, bi-annual and annual health check plan for SCOM DBs.
So far, I wrote the following plan:
- Daily Tasks:
- Monitor the Event IDs that can be troublesome in future like 4706 and 2115.
- Monitor the disk space
- Monitor SQL Server Performance: CPU usage, memory usage, and the number of user connections. Is there any default monitor or rule for the number of user connections?
- Check Database Size: Monitor the size of your OperationsManager and OperationsManagerDW databases to ensure they are not growing too quickly.
- Weekly Tasks:
- Index Maintenance: Rebuild or reorganize indexes based on their fragmentation level. This can help improve the performance of your databases.
- Monthly Tasks:
- Check Database Integrity: Use the DBCC CHECKDB command to check the logical and physical integrity of your databases.
- Backup Database and Transaction Logs: Regular backups are crucial for disaster recovery.
- Bi-Annually Tasks:
- Review Security Settings: Review and tighten security settings to protect your databases from unauthorized access.
- Review Database and Log File Sizes: If your databases or log files are too large, consider archiving old data or increasing the size of your storage.
- Annually Tasks:
- Review Database Design: Review the design of your databases to ensure they are still meeting your needs. Consider whether any changes or improvements could be made.
Review Backup and Restore Procedures: Make sure your backup and restore procedures are still appropriate and that they are working correctly.
- Perform a Full DR Test: This will ensure that you can recover your databases in the event of a disaster.
My questions are:
- Regarding Point 1.a. are there any other event ID that I need to monitor?
- Are there any default monitors or rules that I should enable for SQL servers?
- Please recommend me any powershell scripts or SQL queries that I can use to build a couple of reports for daily check up? Please point me to the blog.
Please advise as necessary.