SCOM 1807 - Unable to add new agents/clients

vafran 121 Reputation points
2020-10-14T06:36:31.127+00:00

Hello,

Suddenly I am unable to add new agents to the SCOM 2016 environment. They are installed correctly, but appear as not monitored in the console.

32242-image.png

On the agents I get the following events:

20070
20071
21016 (less frequently)

On the SCOM server I get events 20000.

The agent control panel shows FQDN of management server and port 5723, which is opened. Also there are no certificates in the environment.

I searched for the issue and ended up removing all objects in maintenances state.

This seems to have started happening since I added a new SQL cluster, but it may be a coincidence. I added the first two nodes just fine, but I added a third one a few weeks later (which is a reinstall of a node from another cluster, preserving the server name), was the first client/agent I found to fail. Also the instances of this cluster are detected from the proxy node, but not also appear as not monitored.

The agents are added from the SCOM console, but if I install manually and then approve it from the console, the situation is exactly the same.

This is the line of events in a completely newly installed agent in a new server

32147-image.png

Any advice?

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,419 questions
0 comments No comments
{count} votes

6 answers

Sort by: Most helpful
  1. Leon Laude 85,666 Reputation points
    2020-10-14T06:51:53.443+00:00

    Hi @AaronVazquez-7771,

    Does this happen to all new agents you're trying to install either manually or by pushing from the Operations Console?

    Which Update Rollup are you running in your SCOM 2016 environment?

    If an agent computer has been upgraded but retained the name, did you ensure to uninstall the agent, make sure it was gone in the Operations Console, and then try to re-install it on the new computer?

    A few things to check:


    (If the reply was helpful please don't forget to upvote or accept as answer, thank you)

    Best regards,
    Leon

    1 person found this answer helpful.
    0 comments No comments

  2. SChalakov 10,266 Reputation points MVP
    2020-10-14T09:06:52.567+00:00

    Hi,

    I absolutely agree with Lein, clearing the cache of a management server is a standard troubleshooting procedure and should not influence the functionality of a management server in any way. Can you please confirm two more things:

    • Can you please make sure that your "Health State" folder (the cache) is not being scanned by an AV software. This is important, because AV programs are locking files, which can cause cache corruption and consequent issues. You need to ensure that the proper exclusions are made:

    Configuring antivirus exclusions for agent and components

    • The second thing, mentioned by Leon is to make sure you have no DB connectivity or performance issues. Those are usually indicated by a particular event - Warning, 2115:

    Troubleshoot event ID 2115-related performance problems in Operations Manager

    Can you please verify this?

    Thanks and Regards,
    Stoyan

    1 person found this answer helpful.

  3. SChalakov 10,266 Reputation points MVP
    2020-10-14T10:52:24.903+00:00

    Hi Aaron,

    I continue thinking that it might be related to your database. Did you check for 2115 Warnings on the management server?

    Please check also the suggested actions when this fires:

    Causes
    This can happen when:
    The database or database server is unavailable (networking issue, firewall, disk space, etc.)
    The System Center Management Configuration Windows Service account no longer has the required access to the database
    The “AgentPoolAssignment” work item has been disabled in the ConfigService.config. The ConfigService.config file is located in “%Program Files%\Microsoft System Center 2012 R2\Operations Manager\Server”.

    Resolutions
    To further investigate the issue, consider the following:
    Review the Operations Manager event log for errors indicating problems with the System Center Management Configuration Service. Filter the event log a source of “OpsMgr Management Configuration” to search for errors.
    Confirm you are not seeing connection error to the Operations Manager database from the management server in the Operations Manager event log
    Using the Operations Manager Console and SQL Server Management Studio, validate the Default Action Account has the correct access to the database where the Operations Manager database is installed. For more information about configuring the Default Action Account please see the Operations Manager Security Guide.
    Open the ConfigService.config file and search for “AgentPoolAssignment” under WorkItems. Make sure Enabled property is set to true. The ConfigService.config file is located in “%Program Files%\Microsoft System Center 2012 R2\Operations Manager\Server”.

    Can you please verify this!

    Regards,
    Stoyan

    1 person found this answer helpful.

  4. vafran 121 Reputation points
    2020-10-14T07:38:32.81+00:00

    Hi @Leon Laude .

    Does this happen to all new agents you're trying to install either manually or by pushing from the Operations Console?
    All of them, either way.

    Which Update Rollup are you running in your SCOM 2016 environment?
    Sorry, it is SCOM1807, not 2016.

    I cleared the cache as per the article.
    Then I reviewed the SCOM management group, and after a few minutes all status are greyed out.

    I received this alert: "The All Management Servers Pool has not reported availability since Wed, 14 Oct 2020 07:19:09 GMT. This adversely affects all availability calculation for the entire management group." But I hope this will fix itself after a while?

    32226-image.png
    32110-image.png

    Firs thing I did is to increased OM Database max size.

    I already had checked the last two points in my previous troubleshooting.

    I can see the snapshot synchronization error in the ManagementServer event log. Not sure how relevant this may be.

    32227-image.png


  5. vafran 121 Reputation points
    2020-10-14T08:40:00.36+00:00

    Hey there. The management group was not greyed out until after deleting the cache.

    Id not see nay other event errors on the management server itself, but this information event is creeping me out:

    Event 21023
    OpsMgr has no configuration for management group XXXXXX and is requesting new configuration from the Configuration Service.

    This is only happening since the cache was deleted on the management server, around 90 minutes ago.

    32291-image.png