SCOM and APM - the simplest workaround.
Hi there,
here I go again to help you out . In this post I will walk you through a temporary workaround for the long running Application Performance Monitoring (APM) issue affecting Internet Information Services (IIS) and SharePoint servers. Again, this is a temporary workaround that I am sharing to unblock the upgrade/update/new installation of your System Center Operations Manager (SCOM) environments, where IIS and SharePoint are in use and APM will not be used.
The reason why I call it temporary, is because it does not fix
the APM issue. Unfortunately, should you need APM monitoring, this workaround cannot help you out and you must wait for the official solution from the Product Group.
NOTE: I got in touch with the SCOM Product Group and they are already working on a permanent fix which is in line with the below changes. Hence, make sure you install the fix and un-configure them once it is released.
The topic:
As you surely know, there is a long running issue impacting SharePoint servers monitored through SCOM. This issue has been already discussed in several posts like:
- https://blogs.technet.microsoft.com/momteam/2017/03/21/apm-feature-in-scom-2016-agent-may-cause-a-crash-for-the-iis-application-pool-running-under-net-2-0-runtime/
- https://blogs.technet.microsoft.com/momteam/2017/05/31/apm-fix-for-agent-crashing-issue-shipped-in-ur3-is-not-completely-resolved/
- https://blogs.technet.microsoft.com/momteam/2017/06/06/update-on-apm-fix-for-agent-crashing-issue-shipped-in-ur3/
- https://blogs.technet.microsoft.com/kevinholman/2017/08/05/reinstalling-your-scom-agents-with-the-noapm-switch/
Reading through the blog posts listed above, it looks like this issue is only affecting SCOM 2016 agents and newer versions. One of the suggested workaround is to rely on SCOM 2012 R2 agent version. Well, in some cases, this does not work or it's not enough .
In fact, during one of our customer visits, me and my colleague Antonio Canitano ran into a new uncovered scenario causing this known issue. While performing an in-place upgrade of SCOM from version 2012 R2 to version 2016, knowing the existing issue, we left the agents at 2012 R2 version.
Everything seemed to work fine, but after a couple of days the customer started experiencing issues on his SharePoint servers (the same servers which had been working like a charm for 2 years with the same SCOM 2012 R2 agent version) even without having the APM monitoring configured.
The issue:
The problem the customer had, was due to the well-know APM issue discussed several times, but in a different scenario where the failure was supposed to not happen and hence unexpected.
Basically, when an agent is installed with the default settings, the APM binaries are deployed, the APM service is created but left disabled. This means that on the agent machine, the APM stuff is present but not used (dormant state). As you can imagine, in this normal scenario, one never expects to see failures (Application Pool crashes) caused by APM. But you know that theory and reality are 2 different things
Since, as recommended in the official posts above we diligently left the agents at 2012 R2 and hence we did not expect this situation, we had to investigate the real reason why this happened. We only noticed a server restart right after the issue, which made us thinking about some configuration that became effective after the reboot.
We started looking around to see what was changed, but nothing came to mind. Then we started looking at the only thing which was changed: the Management Server and Management Pack versions. In particular we noticed that among the new Management Packs there was the Microsoft.SystemCenter.Apm.Infrastructure that got changed. We compared the 2012 R2 and 2016 versions and among the several differences between there was one in particular that caught our attention.
A new parameter for the rule Apply APM Agent configuration was added: the EnableRTIA Profiler with the default setting of True.
This parameter is responsible for the silent activation of the APM Profiler, which become active after the World Wide Web Publishing Service gets restarted, through the creation of 2 new REG_MULTI_SZ registry keys: one under the W3SVC\Environment path and another one under the WAS\Environment path, both with the same values:
- Reg Key1 == HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W3SVC\Environment
- Reg Key2 == HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WAS\Environment
- Values ==
COR_PROFILER={AD5651A8-B5C8-46ca-A11B-E82AEC2B8E78}
Cor_Enable_Profiling=1
The prove and the repro
It is very easy to repro the issue, if you would like. In a test environment, where SCOM 2016 or 1801 is installed, install the SCOM agent (2012 R2, 2016 or 1801) on a SharePoint server using the default proposed settings. You will see that, as expected, the APM service is set as below.
At this point:
Log on to the SharePoint server
From an elevated Command Prompt, perform an IISReset
Open the SharePoint Central Administration site and see the result:
The workaround:
With that amount of information in our hands, we thought about an easy workaround which is broadly applicable. Putting this workaround in place, customers who do not rely on APM could move ahead updating, migrating or simply installing SCOM 2016 or 1801 with no known blockers.
Here are the steps to put the workaround in place:
Create an override for the rule Apply APM Agent configuration (yes, the above-mentioned rule) targeted to the .NET Application Monitoring Agent class for the parameter EnableRTIA Profiler that set it to False.
Wait for the new configuration be delivered and applied. To make sure it is in place, you check for the presence of the following events in the Operations Manager Event Log
Event ID 1201 (containing the name and the version of the management pack you just used to store the override)
Event ID 1210
From an elevated Command Prompt, perform an IISReset
Check the registry (the 2 keys should have disappeared)
Open the SharePoint Central Administration site and see the result … BOOOM, back to work !!!:
Me and Antonio really hope that with this post will make your update/upgrade or fresh install experience easier relieving you from the need of managing the Agent version. You can safely go ahead with whatever agent version and APM installed.
Comments
- Anonymous
November 28, 2018
This is all fine and dandy, but I'm not finding the rule "Apply APM Agent configuration". I'm running SCOM 2016 UR4- Anonymous
November 29, 2018
Hi,to have that rule shown, if you haven't done, you should change the scope in the Authoring panel to include the ".NET Application Monitoring Agent" class.Thanks,Bruno.
- Anonymous