How to make ANY code in ANY system unit-test-friendly
[I added this example in a later post]
There are lots of pieces of code that are embedded in places that make it very hard to test. Sometimes these bits are essential to the correct operation of your program and could have complex state machines, timeout conditions, error modes, and who knows what else. However, unfortunately, they are used in some subtle context such as a complex UI, an asynchronous callback, or other complex system. This makes it very hard to test them because you might have to induce the appropriate failures in system objects to do so. As a consequence these systems are often not very well tested, and if you bring up the lack of testing you are not likely to get a positive response.
It doesn’t have to be this way.
I offer below a simple recipe to allow any code, however complex, however awkwardly inserted into a larger system, to be tested for algorithmic correctness with unit tests.
Step 1:
Take all the code that you want to test and pull it out from the system in which it is being used so that it is in separate source files. You can build these into a .lib (C/C++) or a .dll (C#/VB/etc.) it doesn’t matter which. Do this in the simplest way possible and just replace the occurrences of the code in the original context with simple function calls to essentially the same code. This is just an “extract function” refactor which is always possible.
Step 2:
In the new library code, remove all uses of ambient authority and replace them with a capability that does exactly the same thing. More specifically, every place you see a call to the operating system replace it with a call to a method on an abstract class that takes the necessary parameters. If the calls always happen in some fixed patterns you can simplify the interface so that instead of being fully general like the OS it just does the patterns you need with the arguments you need. Simplifying is actually better and will make the next steps easier.
If you don’t want to add virtual function calls you can do the exact same thing with a generic or a template class using the capability as a template parameter.
If it makes sense to do so you can use more than one abstract class or template to group related things together.
Use the existing code to create one implementation of the abstract class that just does the same calls as before.
This step is also a mechanical process and the code should be working just as well as it ever did when you’re done. And since most systems use only very few OS features in any testable chunk the abstract should stay relatively small.
Step 3:
Take the implementation of the abstract class and pull it out of the new library and back into the original code base. Now the new library has no dependencies left. Everything it needs from the outside world is provided to it on a silver platter and it now knows nothing of its context. Again everything should still work.
Step 4:
Create a unit test that drives the new library by providing a mock version of the abstract class. You can now fake any OS condition, timeouts, synchronization, file system, network, anything. Even a system that uses complicated semaphores and/or internal state can be driven to all the hard-to-reach error conditions with relative ease. You should be able to reach every basic block of the code under test with unit tests.
In future, you can actually repeat these steps using the same “authority free” library merging in as many components as is reasonable so you don’t get a proliferation of testable libraries.
Step 5:
Use your code in the complex environment with confidence! Enjoy all the extra free time you will have now that you’re more productive and don’t have bizarre bugs to chase in production.
Comments
Anonymous
November 20, 2014
The comment has been removedAnonymous
November 20, 2014
While the title leads one to believe that there's some magic involved here, all I see is big words and phrases used to describe the obvious but difficult solution. It amounts to saying, "If you have code that is hard to unit test, remove all the things that make it hard and then it will be easy." In a typical programming environment, making all these changes can introduce bugs. This is really only worth it if one is going to have to maintain the code for long enough to make it worthwhile AND substantial requirements and/or environment changes are expected during that period that may introduce bugs for the unit tests to catch.Anonymous
November 20, 2014
It's not really magic, and it isn't necessarily easy, but it's pretty mechanical. And it survives inspection, even in a very big codebase you could reasonably do this and you end up with code that's much less entangled. Lots of times the kernel of what needs to be tested is very small indeed and its simply all the entanglements that make it hard. But those entanglements are breakable. Doing it in increments helps a lot. I am fond of big words. Sorry about that :)Anonymous
November 20, 2014
Interestingly, I was just writing a presentation on unit testing that talked about a variation of this technique known as "Port - Adapter - Simulator". The port-adapter part is based on Cockburn's hexagonal architecture, and I think the key part is is the adaptation.Anonymous
November 20, 2014
> This is just an “extract function” refactor which is always possible. Global Variables Statics SingletonsAnonymous
November 20, 2014
Those are all the same thing. They don't really complicate the situation in step 1 when you're doing the refactor. You keep referring to them. But in each case you replace the usage of the real global variable with a reference in step 2. Or if it makes sense with a getter/setter. The reference provided can then be changed in the template or the class that's providing the linkage. No different than the function case really. All of these steps are essentially doing the same thing -- you change the code so that rather than referring to a particular global object (method or data) it refers to some global of that type. This gives you the flexibility to change it in the unit test. If you do it with a template there is literally no run-time cost to this and the transform is entirely mechanical.Anonymous
November 20, 2014
This clearly works. The trick is finding the right level of abstraction to slip in the capability. For example you could abstract away all filesystem APIs. Or, you could have a function like "LoadAllDataFromFilesystem" that simply returns a complex object that contains everything necessary. In the real implementation this would call the filesystem many times and use many different APIs and parsers. In the testing implementation it would simply return an in-memory constructed instance with fake data. The trick is placing the abstraction layer at a spot where the surface is small and well-defined.Anonymous
November 21, 2014
Note that there is a tension. If the "real" version of the abstraction is complex then that also has to be tested...Anonymous
November 21, 2014
mechanical you say... would love to see a tool for large c++ code base. fond on creating one?Anonymous
November 21, 2014
Ugh. I would love to see a tool for it too. And though it is "mechanical" it's not easy to automate in general. Unlike the .NET languages and Java, C++ has that @#(%* preprocessor which creates all manner of convolutions that are not readily detangled. Even something as simple as renaming a variable can have bizarre side-effects because of the possibility of macro expansions that no longer match or now match but shouldn't. So, yes you can do it, and it is mechanical, but it isn't exactly free of drama. You often find you have more external dependencies than you thought. I'm convinced that #define is the devil :)Anonymous
November 25, 2014
An actual example would be nice.Anonymous
November 26, 2014
SureAnonymous
December 08, 2014
Example added and linked