What is a deadlock?
A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. Avoiding deadlocks is all about making sure any locks that are acquired in a series (A, B, C, etc.) are always acquired in the same order. For example, say we have locks A and B. Thread 1 always acquires these locks A first, then B. Thread 2 always acquires them B first, then A. If at the exact same time, both of these threads run and thread 1 is at the point where it has grabbed lock A, but not yet grabbed lock B, and at this point thread 2 has grabbed lock B and is ready to grab lock A, we have reached a condition where both threads are stuck forever. This is a deadlock. The chances of this occurring are impacted by how often the locks are acquired in different orders AND by the time that passes between the acquisition of lock A and B. The more time that passes between the point lock A is acquired and the point lock B is acquired, the more likely thread 2 is to run and cause our issue.
Example:
The following thread (thread 1) was attempting to apply group policy. While do this operation it called into shell32 to get a known folder path. Shell32 needed to load a dll to complete the operation. To accomplish the load, this thread needed to acquire the LdrpLoaderLock (this is lock 2). Although you cannot see this yet, this thread has already acquired a different Critical Section (lock 1) in the Shell32!kfapi class.
0: kd> !mex.t fffffa80132087b0
Process Thread CID TEB UserTime KernelTime ContextSwitches Wait Reason WaitTime State
svchost.exe (GPSvcGroup) (fffffa800fb73040) fffffa80132087b0 (E/K) c360.c694 000007fffff92000 .016 .078 987 UserRequest 13:57:51.562 WaitingWaitBlockList:
Object Type Other Waiters
fffffa8011df5040 SynchronizationEvent 40# Child-SP Return Call Site Info
0 fffffa6025f24980 fffff80001857dfa nt!KiSwapContext+0x7f
1 fffffa6025f24ac0 fffff8000184ca0b nt!KiSwapThread+0x13a
2 fffffa6025f24b30 fffff80001ac5428 nt!KeWaitForSingleObject+0x2cb
3 fffffa6025f24bc0 fffff800018555b3 nt!NtWaitForSingleObject+0x98
4 fffffa6025f24c20 0000000077166b5a nt!KiSystemServiceCopyEnd+0x13
5 000000000249c928 00000000771454aa ntdll!ZwWaitForSingleObject+0xa
6 000000000249c930 00000000771453a1 ntdll!RtlpWaitOnCriticalSection+0xea
7 000000000249c9e0 000000007716d637 ntdll!RtlEnterCriticalSection+0xf4 Critical Section: ntdll!LdrpLoaderLock Owning Thread: e2d8
8 000000000249ca10 00000000771539c9 ntdll!LdrLockLoaderLock+0x137
9 000000000249ca50 0000000076f3bfc0 ntdll!LdrLoadDll+0xf9
a 000000000249cd40 0000000076f48c26 kernel32!LoadLibraryExW+0x3a2
b 000000000249cdd0 000007fefd970b71 kernel32!LoadLibraryA+0x46
c 000000000249ce00 000007fefd970af7 SHELL32!__delayLoadHelper2+0x85
d 000000000249ce90 000007fefd961969 SHELL32!_tailMerge_ole32_dll+0x3f
e 000000000249cf00 000007fefd961140 SHELL32!kfapi::CRegistryKeyProvider::OpenDefinitionKey+0x61
f 000000000249cfb0 000007fefd9617f6 SHELL32!kfapi::CFolderDefinitionStorage::LoadRegistry+0x92
10 000000000249d1c0 000007fefd96159d SHELL32!kfapi::CFolderDefinitionStorage::Load+0x62
11 000000000249d3d0 000007fefd9606cc SHELL32!kfapi::CFolderDefinitionCache::Load+0x111
12 000000000249d5a0 000007fefd97bff0 SHELL32!kfapi::CFolderPathResolver::GetPath+0xb8
13 000000000249d940 000007fefd97c492 SHELL32!kfapi::CFolderCache::GetPath+0x153
14 000000000249da40 000007fefd97c3b6 SHELL32!kfapi::CKFFacade::GetFolderPath+0x9a
15 000000000249daf0 000007fefd91d7bd SHELL32!SHGetKnownFolderPath_Internal+0x8c
16 000000000249db60 000007fef2469b70 SHELL32!SHGetKnownFolderPath+0x1c
17 000000000249db90 000007fef246796b fdeploy!CFileCacher::Init+0x70
18 000000000249dc10 000007fef246333f fdeploy!CPolicyComputant::GetRedirectionInfo+0x1a7
19 000000000249e400 000007fef2465571 fdeploy!CEngine::ProcessGroupPolicyEx+0x20b
1a 000000000249e4c0 000007fefbaf1e73 fdeploy!ProcessGroupPolicyEx+0x1f9
1b 000000000249e570 000007fefbaf0088 gpsvc!ProcessGPOList+0x637
1c 000000000249e8f0 000007fefbaebfd5 gpsvc!ProcessGPOs+0x2c50
1d 000000000249f720 000007fefbb381ad gpsvc!ApplyGroupPolicy+0x7d5
1e 000000000249f9c0 000007fefbb3b645 gpsvc!CDefaultPolicyApplier::ApplyGroupPolicy+0x4d
1f 000000000249fa10 000007fefbb3b124 gpsvc!CGroupPolicySession::ApplyGroupPolicyForPrincipal+0x4e1
20 000000000249fae0 0000000076f3aefd gpsvc!CGroupPolicySession::ApplyGroupPolicyThread+0x30
21 000000000249fb20 0000000077146591 kernel32!BaseThreadInitThunk+0xd
22 000000000249fb50 0000000000000000 ntdll!RtlUserThreadStart+0x1dIn order for thread 1 (above) to move forward, we need to find what the current owner of the loader lock (lock 2) is doing. We see the thread below is in the loader (look for ntddll!LdrLoadDll which leads to gpprefcl!dllmain. So this thread (thread 2) acquired the loader lock (lock 2), but is waiting on a critical section (the one for shell32!kfapi) owned by the thread above (thread 1)
0: kd> !mex.t -t e2d8
Process Thread CID TEB UserTime KernelTime ContextSwitches Wait Reason WaitTime State
svchost.exe (GPSvcGroup) (fffffa800fb73040) fffffa8012193bb0 (E/K) c360.e2d8 000007fffffd8000 .016 .047 717 UserRequest 13:57:50.921 WaitingWaitBlockList:
Object Type Other Waiters
fffffa801063b370 SynchronizationEvent 3# Child-SP Return Call Site Info
0 fffffa602529e980 fffff80001857dfa nt!KiSwapContext+0x7f
1 fffffa602529eac0 fffff8000184ca0b nt!KiSwapThread+0x13a
2 fffffa602529eb30 fffff80001ac5428 nt!KeWaitForSingleObject+0x2cb
3 fffffa602529ebc0 fffff800018555b3 nt!NtWaitForSingleObject+0x98
4 fffffa602529ec20 0000000077166b5a nt!KiSystemServiceCopyEnd+0x13
5 000000000188d2f8 00000000771454aa ntdll!ZwWaitForSingleObject+0xa
6 000000000188d300 00000000771453a1 ntdll!RtlpWaitOnCriticalSection+0xea
7 000000000188d3b0 000007fefd97bb53 ntdll!RtlEnterCriticalSection+0xf4 Critical Section: 0000000003fce220 Owning Thread: c694
8 000000000188d3e0 000007fefd9606cc SHELL32!kfapi::CFolderDefinitionCache::Load+0x4b
9 000000000188d5b0 000007fefd96206d SHELL32!kfapi::CFolderPathResolver::GetPath+0xb8
a 000000000188d950 000007fefd97c492 SHELL32!kfapi::CFolderCache::GetPath+0x33b
b 000000000188da50 000007fefd97c3b6 SHELL32!kfapi::CKFFacade::GetFolderPath+0x9a
c 000000000188db00 000007fefd97d03c SHELL32!SHGetKnownFolderPath_Internal+0x8c
d 000000000188db70 000007fefd9623d9 SHELL32!SHGetFolderPathEx+0x32
e 000000000188dbc0 000007feee4ae19e SHELL32!SHGetFolderPathW+0xed
f 000000000188dc30 000007feee4623f2 gpprefcl!apmSHGetFolderPath+0x7a
10 000000000188dc70 000007feee462843 gpprefcl!apmClientTraceBase::initialize+0x102
11 000000000188dda0 000007feee462215 gpprefcl!apmClientTraceBase::TraceFormat+0x27
12 000000000188ddd0 000007feee4857b1 gpprefcl!apmTraceFormat+0x49
13 000000000188de90 000007feee49753f gpprefcl!DllMain+0x6d
14 000000000188dec0 000000007715422d gpprefcl!__DllMainCRTStartup+0xbf
15 000000000188e020 0000000077161a28 ntdll!LdrpRunInitializeRoutines+0x1f6
16 000000000188e200 0000000077153a06 ntdll!LdrpLoadDll+0x4b1
17 000000000188e520 0000000076f3bfc0 ntdll!LdrLoadDll+0x136
18 000000000188e810 000007fefbaf165c kernel32!LoadLibraryExW+0x3a2
19 000000000188e8a0 000007fefbaf1c66 gpsvc!LoadGPExtension+0x40
1a 000000000188e8d0 000007fefbaf0088 gpsvc!ProcessGPOList+0x42a
1b 000000000188ec50 000007fefbaebfd5 gpsvc!ProcessGPOs+0x2c50
1c 000000000188fa80 000007fefbb381ad gpsvc!ApplyGroupPolicy+0x7d5
1d 000000000188fd20 000007fefbb3b645 gpsvc!CDefaultPolicyApplier::ApplyGroupPolicy+0x4d
1e 000000000188fd70 000007fefbb3b124 gpsvc!CGroupPolicySession::ApplyGroupPolicyForPrincipal+0x4e1
1f 000000000188fe40 0000000076f3aefd gpsvc!CGroupPolicySession::ApplyGroupPolicyThread+0x30
20 000000000188fe80 0000000077146591 kernel32!BaseThreadInitThunk+0xd
21 000000000188feb0 0000000000000000 ntdll!RtlUserThreadStart+0x1dNote that this condition would not have occurred if these two thread did not run at the same time. Now... How do you fix this. In this particular case the module gpprefcl.dll is breaking the rules for what is allowed inside the dllmain funciton. If you refer to MSDN on the details for dllmain, you will see it clearly says keep it short and simple. The issue above resulted in a hotfix that removed the logging code from the dllmain function. This negated the existing dependency on shell, which avoided the issue all together.
Not all deadlocks involve the loader lock and dllmain. In other cases the rule is "make sure locks are acquired in the same order every time". Lock 1, then Lock 2. If you have code that aquires the same locks in reverse while another thread tries in the normal direction, you have the potential to deadlock. The time between the acquisition of lock 1 and lock 2 combined with how frequently the locks are acquired will determine how likely a deadlock is going to occur.