MaxConcurrentAPI - or, how fast can you authenticate users?
MaxConcurrentAPI.. what is it?
It is referenced a few places:
How to configure your ISA Server for a very large number of authentication requests
Edit 11/21/2011 NOTE: Please see https://support.microsoft.com/kb/975363 for updated informaton
Background:
ISA Servers were very very slow when processing web requests only when domain authentication was turned on.
First, enable the netlogon log ( can you tell from previous posts, that I really like his log? ) on the ISA server and examine the log during the time the ISA response was slow.
Second - parse this log for interesting data during the time of failure(s)
Y:\data>findstr /i NlAllocateClientApi Server02Netlogon.log > AllocateAPI.txt
AllocateAPI.txt shows:
12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
We see a repeated pattern here:
12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:58 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User7161 from ClientMachine16151 Returns 0xC000005E
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User233 from ClientMachine16333 Entered
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User3433 from ClientMachine16123 Returns 0x0
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Entered
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User546456 from ClientMachine453 Returns 0x0
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User54642 from ClientMachine565 Entered
12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:59 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Returns 0xC000005E
err.exe shows us ( https://www.microsoft.com/downloads/details.aspx?FamilyId=BE596899-7BB8-4208-B7FC-09E02A13696C&displaylang=en )
F:\tools\err.exe\i386>err.exe 0xC000005E
# for hex 0xc000005e / decimal -1073741730 :
STATUS_NO_LOGON_SERVERS ntstatus.h
# There are currently no logon servers available to service
# the logon request.
# 1 matches found for "0xC000005E"
A review of this area in netlogon shows us we have a standing timeout period of 45000 ms before we will post the first error ( and subsequent error for NlpUserValidateHigher and the return error 0xC000005E)
The number of outstanding RPC calls to the DC is throttled via the MaxConcurrentApi setting.
We use a semaphore to manage this queue, so when the netlogon log shows us that the API timed out – it was waiting longer than 45 sec for a slot to open up, and finally gave up and quit.
They had already increased the setting for MaxConcurrentAPI per q326040, to 10.
The theory here, was that the DC simply was not responding fast enough to satisfy the queued requests, even with this value increased.
We took a userdumpof LSASS on the ISA server and the DC which the ISA server held a secure channel to ( found via Nltest) .
This was important because we need to see what ISA is doing at this time, as well as what the DC was doing at this time.
The ISA LSA dump confirms that we are very busy with 283 threads in LSA - about 115 of them waiting for one of these slots to open.
What's the DC doing? It had 118 threads .
Some are processing the logon requests which takes in the data from netlogon on the DC, sends the data to the NTLM SSP (msv1_0.dll) and eventually calls into the Authentication directory .
But, most ( if not all ) of the busy threads were waiting.
So what are all the DC threads waiting on? A critical section which is needed when we allocate memory from the heap. Normally this would not be an issue, but someone had set gflags on LSASS on this machine.
0:037> !gflag
Current NtGlobalFlag contents: 0x00001040
hpc - Enable heap parameter checking
ust - Create user mode stack trace DB
UST should not cause a huge perf hit, but HPC will.
Disabled via running: "gflags -i lsass.exe -hpc -ust"
After disabling and rebooting the DC – things ran smoothly.
PS: The netlogon log is really really cool...
Spat
Comments
Anonymous
January 12, 2006
One other angle on this is the DC is trying to contact trusted domains over slow links or the like.
If there are a lot of trusts and the user managed to send their logon name as NULLuser
- OR -
There is a trusted domain and pass through has to really bad link your DC will take forever to time out.
Look at the NEVERPING netlogon parameter that added a while back.Anonymous
January 13, 2006
Excellent point Slight has - if there are trusted domains you basically follow the paths back - and in these cases MaxConcurrentApi applies to the trusting DC ( as opposed to a member server as in the case I posted about )
The neverping refers to http://support.microsoft.com/?kbid=825107Anonymous
May 18, 2006
In my last post I laid out the basic flow of NTLM authentication with SharePoint when all the accounts...Anonymous
May 20, 2006
Is that value 4500ms or 45000ms? Is this value in the registry?Anonymous
May 20, 2006
Matheesha
Thanks! Sorry - its 45000 ( 45 seconds )
And no its not configurable ( that I know of - I need to go back and review it to be 100% )
PS
I corrected the value in the post thanks to you
spatAnonymous
March 09, 2007
Kerberos is well… Kerberos. And NTLM is NTLM right? Right. Most of the time. This post is from a recentAnonymous
March 13, 2007
Note - you can set the reg keys for maxconcurrent and then cycle the netlogon svcAnonymous
April 14, 2007
In the truted domains scenario, how is possible to verify the effect of this Key? Perfomon "NDTSNTLM Authentications" item could be useful? It could be nice to have a performance baseline before regkey optimization and after.