Jake P’s Favorite Bug
You can call me Ishmael, err...Jake. We had been getting a bizarre error reported intermittently from the lab: 'No Disc Detected' modal message box. What was really bizarre is that the tests and machines it hit seemed to be random with no correlation (Word tests, Excel tests, etc.). The incident rate was very low too (0.1%) which frustrated our attempts to track it down. Lastly, 'No Disc Detected' allegedly could be reported for any number of reasons, all of which seemed impossible to occur in the lab. Finally Rya... err, Raphael was able to repro it in his office on one of his machines (total accident, he was doing a local BVT run to validate one of his changes to our intrinsic debugger). Although he could reproduce it he couldn't figure out why the test was causing the error. The error would occur only when the CD tray was open. After combing through the test code, we couldn't see anything that should ever hit the disk let alone the CD drive. It took a few minutes but I asked Raphael to run the test with the application under the debugger too. After running the test we noticed that in the externals window it had loaded a lot of DLLs. Most were system or known Office DLLs. However one stood out...PopularApplication32.DLL. I asked Raphael to step run it again and see what happened when we loaded that dependency. Sure enough we got the error. Although we seemed to have found the culprit, we still didn't know why. Finally I asked Raphael to crack PopularReader32.dll with an EXE reader. We scrolled down to the PDB section... there it was. As I had suspected, PopularApplication32.dll was stamped with a PDB path, a path that started with "G:". Raphael's CD drive happened to be mapped to G:.
So what had happened? Someone at the leading third party software vendor who makes PopularApplication had obviously either built it from a non-build machine or not properly cleared the PDB stamp. Having this application enabled was a special config so we didn't always run with it on. Also you had to be running with our intrinsic debugger turned on and be on a box that had an empty CD drive mapped to G:. Without the debugger, symbol loading wouldn't be triggered. Without the empty G: drive the symbol lookup would simply silently fail and continue.
Totally bizarre and proof that you should not take your build system lightly. We noticed that the next update from that vendor didn't seem to have that issue. :)
-- Jake P.
Do you have a bug whose story you love to tell? Let me know!