question

NS-2746 avatar image
0 Votes"
NS-2746 asked GurhyJohn-5524 answered

Distribution list expansion cause high cpu usage Exchange 2016

Env. details

2x exchange 2016 servers with 12 cpu/ 64 GB / 1500 users
running latest cu18, OS patches
Outlook 2013 in cached mode

it was observed that cpu usage can spike to over 95% when news mailbox send email to 1000+ users. Initially it was found that sender copy paste all recipient email addresses individually, causing outlook to do the hard work and it freeze for everyone trying to resolve all 1000 TO field: email address doing ad queries. Maybe sender expanded the dl before hitting send button

I’m wondering if anyone has seen it with exchange 2016 and if there is a setting to prevent it. Even when server did the expansion, email was sent to a dl, cpu goes back to same high usage condition and outlook was locked for up to 1 min. cu18 was recently done as it contained a fix specific to msexchangemailboxmapiapppool threads to go into lock contention, causing 100% usage

Thanks in advance

office-exchange-server-mailflow
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

This started after CU18? Spikes are normal, but should be quick and not affect the clients.

0 Votes 0 ·
NS-2746 avatar image
0 Votes"
NS-2746 answered AndyDavid edited

Not really, cpu usage was seen high before but we only link it with these types of email later. For performance and security reason cu 18 was done, as there was also fix for that thread contention issue

Cpu spikes are ok but not for long that users notice it, that is main issue currently. And after email is processed I guess things go to baseline numbers

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1500 is pretty small and Exchange will handle that easily. Are these virtual machines? Are they configured correctly if so?
If message expansion is causing the clients to lock up, then something is not configured correctly.
Specifically as far as vCPUs and Memory allocation

https://docs.microsoft.com/en-us/exchange/plan-and-deploy/virtualization?view=exchserver-2019



0 Votes 0 ·
NS-2746 avatar image
0 Votes"
NS-2746 answered AndyDavid edited

I would think so also.. yes these are virtual machines and I had confirmed no over allocations is done on resources. Is there any specific log under the main logging folder related to group expansion or registry settings in E2016

ty

· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Not ones that you could really tie back to a configuration issue.
Which specific service is spiking? Anything in the event logs when it happens?


0 Votes 0 ·

I see event 1005 around that time, MSExchangeDiagnostics

Processor total%processor time sustained a value of 85.46, for the 5 minute interval

Actual time in GC seems high , over 10 for syncapppool
Node runner for indexing
Exchangedelivery
Mapimailboxapppool

Total cpu spent in kernel | 15.2


similar numbers on second server

0 Votes 0 ·

By the way, anything else installed on these servers?
anti-malware, disclaimer software, 3rd party hooks into Exchange, etc...

0 Votes 0 ·
Show more comments
NS-2746 avatar image
0 Votes"
NS-2746 answered

also event 8019- creating extra connections for idle queue. Smtp delivery to mailbox

I saw another discussion from year 2016, someone mention they are frequently getting this event and also having issues with random clients getting disconnected. hmm

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NS-2746 avatar image
0 Votes"
NS-2746 answered AndyDavid commented

vSphere power management settings are disabled, I was told. Exchange vms are set to hp. configuration side for cpu MHz there is no limit set, but there is also no reservation, per best practices from VMware. but host has plenty capacity for available MHz for guests

this came to light when users complained about outlook freeze or slow. it was noticed that when these emails go out, users see it. Before the 2 servers had 8vcpu with 48gb, as exchange calculator advised when they were built. Then 4 additional cores and 16gb added. This is not started with cu18, i don’t think so, but cu18 do fix a bug that can compound this condition. Before installing cu18, I have monitored msexchangemapiapppool w3wp with highest cpu time

This issue is very reproducible. you can see alarms on vSphere side, cpu usage turn from green to red. will discuss if more cores can be added, memory seem to be doing ok

Baseline numbers are between 60-70% cpu usage, so there is headroom. In you opinion is it fair to say that when suddenly mailbox profile changes, current servers resources are stretched. I’m also thinking about 3rd server to scale things out, not scale up




· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Yea those baselines are pretty high for CPU. You should be able to run all the 1500 users on one server.
Regardless, if you are thinking of adding an additional server and spreading the load, that is something I would do regardless. Im not a fan of two-node DAGs ( this is a DAG yes?)
I recommend at least 3 nodes in the cluster so you can do maintenance on one and still have redundancy...

0 Votes 0 ·
EricYin-MSFT avatar image
0 Votes"
EricYin-MSFT answered

This blog for Exchange high CPU issues might help: https://techcommunity.microsoft.com/t5/exchange-team-blog/troubleshooting-high-cpu-utilization-issues-in-exchange-2013/ba-p/603753


If an Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NS-2746 avatar image
0 Votes"
NS-2746 answered

I started t/s with that blog.. things got better after 4 additional cores are added so from 8 to 12, cu18 probably helped with bug fix too, but this issue was first seen as sender not using a dl, so advise was to always use it, when possible. But then same thing happened when a dl was used. And last week when 4 similar emails got sent in the same hr, event 1005 got logged on both servers. If just 1 email, there is no event

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NS-2746 avatar image
0 Votes"
NS-2746 answered EricYin-MSFT commented
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Honestly, you shouldnt have to limit anything. The servers appear to have the capacity.
How big are the databases? I like to keep them around 250GB max.

0 Votes 0 ·

database size is not per best practices.. I know at some point it will have performance issues

Interesting that env. with ideal size databases have experienced it also, like mentioned in the reddit link
I will try to closely monitor noderunner, parser

0 Votes 0 ·

Hi,
Anything found with noderunner?

0 Votes 0 ·
NS-2746 avatar image
0 Votes"
NS-2746 answered NS-2746 commented

I observed 2 conditions..

mass email is sent to all users, typical email under 1mb, no attachment or meeting invite, cpu does spike short time but effect is low, under 1-3 min

mass email with calendar invites, this really pin the cpu and cost up to 12 minutes of performance, where it stay above 95 to 100%. I also think when sender expand dl or cut copy email addresses in To box, it further compound to cpu usage. I saw noderunner use as high as 40-50% cpu but not sustained. vCPU got 4 additional cores so 16/ server now, will see how much it will help with usage and queue length under load conditions

I circled back at power management settings, Andy mentioned it above and initially vm team told no power management is on but I looked and found host is at balanced setting. I have submitted a change to set it on high performance mode and also look at hp, dell bios settings at same time. I’m hoping this will fix it

there is a VMware doc..

https://kb.vmware.com/s/article/1018206

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Cool. Let us know if things look better once the performance is set to high. This bit me once and changing it made a huge difference.

0 Votes 0 ·

fingers crossed

16 cpu didn’t do much, we had 3 mass emails today and same result, maybe it helped tiny bit but if users got disconnected and notice freeze, it’s a fail

0 Votes 0 ·
NS-2746 avatar image
0 Votes"
NS-2746 answered EricYin-MSFT commented

can anyone try to replicate this in the lab or test env. if possible

it is weird that emails with calendar invite can run away with so much resources, when so many built in checks are present. I also checked mail queue file and although it is big over 110gb, it probably contains lots OS white space due to same design, thanks

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Sorry my lab doesn't have such big scale. When sending similar invites to all users, I saw a similar rapid increase of CPU usage but far away from 90%.
Did you find any events logged by back pressure?


0 Votes 0 ·
NS-2746 avatar image
0 Votes"
NS-2746 answered NS-2746 commented

no back pressure events.. we noticed that when no dl is used and email includes a cal invite and all 950+addresses are individually added in the to box, cpu goes over 95% and stay hot for 5-7 minutes.

Servers cpu usage typically hover under 60% during the day. And mass email with bcc and his no “cal invite” is ok too. It has little impact on cpu. email size is 1mb. Could this be related to Availability service in exchange, trying to pull scheduling info for all the recipients but than impact should be before sending the email. however it was reported by sender that this email does take time to send out


anonymous userDavid -high performance vSphere didn’t do much, in fact server bios was set to performance when deployed. Also CTSProcessoraffinity- value 15 reg key is present, probably it was added by project team during deployment

Thanks



· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Bummer. Consider opening a ticket with Microsoft. What you are describing is not normal given what seems as adequate specs to handle your mailbox count

0 Votes 0 ·

Hi NS-2746,
Any updates?
Did you install any third-party tools that might affect performance?

0 Votes 0 ·

I used PAL (https://github.com/clinthuffman/PAL) and saw more than 30%of the cpu time spent in privileged time, these servers do have mcafee solidcore and mcafee av. In total 9 mcafee drivers are installed that could be responsible for this cpu time. It was also installed in exch2010 but then servers were not multi role and exch architecture is quite different compared to ex2013,2016,2019 which is more cpu heavy. Av exclusions are setup, as mentioned earlier

Also some Asp.net exceptions and time in gc events >10sec is flagged as warnings, when servers are spending extra time in gc and that add to high usage. default .net settings is workstation gc. I read in exchange 2019, default gc method is now server gc and it is more efficient process for garbage collection. Anyway my advise is to use dl when possible or scale out.. M365 project completion within 6 months is possible

0 Votes 0 ·

8 Virtual Exchange Servers with Active Passive DAG

We had the same issue Today. Calendar invite with huge amount of dL with about 15000 recipients. All Server with active Database had 100% CPU usage.

This caused mail delay and outlook freezes for about 25min.

Did you ever find the cause? Or Solution?

0 Votes 0 ·
Show more comments