Preventing Build Hangs and File Locking on Build Servers When Running Unit Tests
Note : An updated version of this post can be found here.
Recently I was tasked with changing our daily and Continuous Integration builds so that they would also execute our unit tests. This seemed like a straightforward task, and indeed it was. Except for one thing: every so often a build would fail because another process was locking the test results directory.
When we looked at the build server we saw a number of instances of a process called DW20 running on the server. This process was locking the test results directory. Terminating these processes allowed further builds that included unit tests to run.
DW20 is the Windows Error Reporting program. What was happening was that Windows Error Reporting was asking for permission to send error reports to Microsoft, and waiting for a response from a user. Being on a server this was never going to happen, especially as the prompt could not be seen!
So, two questions needed to be addressed. The first was to find out what was causing the errors that were triggering Windows Error Reporting in the first place, and the second was to see if there was a way to stop DW20 waiting for a prompt when there was an error.
The easiest one to solve was to prevent DW20 waiting and locking the test results directory. On Windows Server 2008 you can configure Windows Error Reporting to not wait for input from the user. To find it, run Server Manager and then click on Turn on Windows Error Reporting (you may need to scroll down to find it). As you can see from the screenshot below, on our build server it was configured to ask about sending reports every time there is an error. Choose either the first or the last option to be sure that DW20 will not wait and lock your test results directory.
The other problem was to work out why we were getting these errors in the first place. For this I used the Windows Sysinternals Process Explorer tool. Using this I hovered the mouse over the DW20 processes, this showed me the command line parameter that was being passed to each DW20 process. This pointed me to a file in a temporary directory which contained the following:
Version=131072
General_AppName=QTAgent32.exe
EventType=VSTEExecutionFrameworkUE
LoggingFlags=0
UIFlags=1
EventLogSource=Team Test Error Reporting
UI LCID=127
P1=QTAgent32.exe
P2=v2.0.50727
P3=10.0.0.0
P4=Unknown
P5=ArgumentNullException
P6=27F84F9B
FilesToDelete=C:\Users\build\AppData\Local\Temp\tmp7C26.tmp|C:\Users\build\AppData\Local\Temp\tmp7C29.tmp|C:\Users\build\AppData\Local\Temp\tmp7C3A.tmp
ReportingFlags=15
Main_Intro_Bold=An unexpected condition has occurred.
Main_Intro_Reg=An unexpected condition has occurred in the test execution framework. Information about the condition has been gathered.
Main_Plea_Bold=Please tell Microsoft about this problem.
Main_Plea_Reg=We have created an error report that you can send to help us fix bugs. We will treat this report as confidential and anonymous.
Queued_EventDescription=An exception has occurred in the test execution framework component: Value cannot be null.
Parameter name: certificateFindKey
The important thing to note in this file is that the process which crashed, was running QTAgent32.exe. This is the test runner used by the TestToolsTask task to run unit tests on the build server without requiring Visual Studio to be installed.
This immediately reminded me that two of the unit tests were failing with an “Error” state rather than the more usual “Failed” state. When the test runner reports an Error state for a test it means that the test caused an error in the test runner itself. The most common reason I have seen for this is in tests that use threads, and indeed in this case the problem tests were using threads. Removing those tests stopped the QTAgent32 errors from happening in the first place.
So, the conclusion is simple.
- Watch out for unit tests which fail with an Error state, an fix them as a matter of urgency.
- Make sure that on all of your build servers Windows Error Reporting is not configured to prompt.
I hope this has been useful and helps you to avoid problems running unit tests as part of your daily and Continuous Integration builds.
Written by Rob Jarratt
Comments
- Anonymous
November 01, 2011
Spent some time going through our build servers after this and found that most did indeed have Error Reporting turned on by default. Have switched these off and will hopefully be following up on the other point as a matter of urgency as well to flush out any other potential issues with unit tests. As an aside, to do this on windows 2k3 devices go to Control Panel->System->Advanced and then click "Error Reporting" to disable error reporting for 2K3 servers.