10 Golden Rules Of Good OOP
A good architecture means money saved in learning, maintaining, testing, fixing, extending and scaling source code. This requires more time and care for the initial development, but quickly returns the investment with great interests.
Inevitably, even the best designed architectures need some adjustment and refactoring in time.
Patches, modifications and last minute changes are part of the IT business, therefore, ‘quick and dirty’ solutions, when needed, should always try to confine themselves in an isolated place where that it will be easy to refactor later on, and that will affect as minimum as possible all the other components.
The following guidelines are not exhaustive and are meant to be applied on top of the SOLID principles and proper use of OO Design Patterns.
When designing a new class or refactoring an existing one, a developer should make a list of all of the tasks performed by that class and come up with a name that easily and shortly represents what the class does.
If the class does too many things, a representative name is usually something monstrous like: PageInfoBuilderAndConfigurationLoaderAndLinkAnalizerAndCacheManager
. That means the class has too many responsibilities and should be broken into multiple components, ideally, with one responsibility each.
When using OO Design Pattern, it is much easier come up with meaningful names, since the design pattern itself frequently identifies the type of responsibility.
For example:
- If the class creates objects the use the suffix Factory or Builder
- If the class is responsible of coordination and communication between other business classes than use the suffix Mediator or Façade
- If the class is used to control the use of a resource class than Proxy would be a good suffix
- If a class is wrapping another in other to adjust its use for a consumer class, than use the Adapter suffix
Excessive conditional logic makes the head of the developer spin like Regan in The Exorcist. Things become uglier when the same conditional logic is applied in many places in the application. It means that different business behaviors are managed with ifs and switches instead of being properly managed using inheritance or composition, separating the different behaviors into different implementations of a common abstract class or interface. For instance, there is a configuration variable that is telling if data should be stored in a database or in a file. Then everywhere in the code wherever we have to save something, we use the conditional logic:
If (ConfigParameters.SaveTo == StorageType.DataBase)
[… Save something in the db]
Else
[… Save something to file]
Imagine that we need to add a different behavior (for instance, save to a distributed cache): now we have to extend the conditional logic everywhere in the code. Imagine then, multiple configuration variables determining multiple behaviors and leading to a jungle of unmaintainable and intricate conditional statements.
The way to go, in the above example, should be:
- Create an abstract class or interface representing a Storage object (e. g., IStorage), with save and load methods.
- Create different implementations if the abstraction such as
DBStorage
,FileStorage
,CacheStorage
, etc. - Create a factory that will instantiate the right storage implementation based on configuration and return it as abstraction (
IStorage
) - When in need of saving data just write:
myStorage.SaveSomething(…)
wheremyStorage
is aIStorage
variable created through the factory
Sure, it’s a lot of work but still much less work than maintaining conditional jungles. It also has one big advantage that is explained in the following guideline.
Developers that are abusing conditional logic are digging their own graves, and usually end up being overstressed.
A seam is an area of source code where the behavior can be changed without editing the code. Seams leverage flexibility and they are great for one simple reason: They encourage extending the architecture instead of changing it.
Why is extending better than changing? Pretend that you are a new developer and you are asked to add new storage options to an application.
Would you prefer to go through the code and change it with the new options wherever something is saved? Finding your way around the code that someone else has written, that you do not fully understand yet, risking of breaking something or missing one of the hundred places that need to be changed?
Or would you rather create a new class that you will write from scratch that would implement the IStorage interface and be done without even looking at someone else’s code, risk free?
It is not a surprise if the degree of confidence of a developer is higher in this last case.
COBOL developers in the ‘80s knew already that abusing global variables is a bad, bad practice. Global variables are rarely justified, and their damning side effects can lead to mental insanity. That wise old rule is still extremely valuable also in modern object-oriented languages, such as C#.
One of the many issues is that they create a global state of the application that compromises the deterministic behavior of functions/methods; in other words, calling the same function twice with the same parameters can give completely different results. Hence, the code is fragile, difficult to debug, and extremely difficult to run on multiple threads.
Use of any sort global variables/objects is therefore highly discouraged.
Ideally a good architecture is stateless. Practically speaking though, almost every architecture need some sort of state (e.g. a database, files, and etcetera).
The solution is state isolation and consists in two simple rules:
- Keep the life and scope of a state as short as possible (which means, for instance, that even the class members should also be contained).
- Isolate the state by wrapping into a separate layer that presents itself through abstractions (base classes or interfaces).
For instance, a database can present data in the form of business entities through a well-defined interface of a Data Access Layer. The isolation will make life much easier when it comes to testing or troubleshooting a problem.
Developers are frequently tempted of creating static classes, especially when it comes to helpers, utilities and so on. The reason why a developer resorts to a static class is not much different from the reason a COBOL developer would use a global variable in his code: it gives immediate access to data from anywhere in the code, no effort needed.
Unfortunately, static classes act frequently as a global state and create the non-determinism that should be avoided. But it gets worse. Since static classes can be used everywhere in the code without being passed explicitly as parameters, they create secret dependencies that are not revealed by the API documentation. The code behavior then becomes less and less declarative and both the clarity and maintainability of the code drops dramatically. Static classes come with global variables and dependencies creating tight coupling in all the consumers (coupling is transitive).
Last, but not least, code using static classes is not testable in isolation, making unit testing a nightmare.
Unless strictly needed for performance reasons, the use of static classes should be avoided. Static variables are still OK for constant objects (although a static property without setter would be better) or to hold private references to objects inside factory classes.
One of main principles of a good architectural design is that the logic of creating an object and the business logic (what the object really does) are two different concerns that should be kept as far apart as possible.
The creation of objects is a concern that belongs to specialized classes, such us factories or builders. They only and exclusively should have the monopoly of creating objects.
The objects, on the other side, should only be concerned about performing business logic (ideally, a single business concern only) and not to worry about creating other objects (dependencies).
For instance, an object called HTMLAnalyzer (designed to analyze html links) needs a LinkAnalyzer in order to work. That means it depends upon the LinkAnalyzer class and therefore, an abstraction of LinkAnalyzer should be explicitly passed as parameter in the constructor of HTMLAnalyzer or as parameter of the methods that are using it. A developer may instead think of creating the LinkAnalyzer inside the HTMLAnalyzer using the new statement.
Double mistake:
- First of all, the new statement creates a dependency to a specific type (no abstraction); therefore, the behavior is carved in stone and impossible to change without major refactoring.
- To test HTMLAnalyzer in isolation is now impossible because we will have to test LinkAnalyzer at the same time.
The correct way is to use dependency injection to explicitly and declaratively inject all the needed dependencies when and where they are needed, for instance, in the constructor.
In this case, the object that uses a HTMLAnalyzer will invoke a factory to obtain an abstraction (interface or base class) of a LinkAnalyzer object and inject it into the HTMLAnalyzer constructor.
At any time, the behavior of LinkAnalyzer can be changed by creating alternative implementations and the only change required will be in the factory, not in the actual code of the business logic where changes are expensive and dangerous.
One simple implication of this is that the Singleton design pattern intrinsically wrong and should never be used. It mixes in itself the creational logic (it creates itself) and the business logic, not to mention that it holds itself as static global object that it will never make it to the garbage collector (a singleton, like love, is forever) and constitutes a nasty global state of which we already spoke.
Factories are the long, simple, easy to control and boring pieces of code where all the changes are simple.
Business objects contain all the magic tricks and complexity, so the less we change here, the less we break.
Keeping them separated will lead to less business logic modification and more business logic extension.
As a corollary of the previous point, business logic should never be coded into constructors of an object. The purpose of constructors is only to assign some properties, initialize variables and eventually hook up events. If constructors are actively doing something business-related, then, we will never be able to separate the creational logic from the business, and the architecture will always be messy.
This principle is mostly unknown among developers and, of course, one of the most useful. Without going into formal definitions, the principle states that a class should depend only and strictly upon what it is using; therefore, only what is really used should be injected into a class.
A great explanation that I have found in a Google TechTalk (http://www.youtube.com/watch?v=RlfLCWKxHJ0) is the following:
In an eCommerce system, each user has a Wallet object containing a collection of CreditCard
objects and other financial info. The class responsible for the online payment has a method like this:
bool Pay(Wallet wallet, string ccNumber, double amount) {
CreditCard cCard = wallet.GetCreditCards().GetCard(ccNumber);
return ProcessPayment(cCard, amount);
}
How would you consider a payment method like this? When you are paying for your new shirt at Macy’s, would you give your entire wallet full of credit cards and cash to the cashier and let him/her pick the right Credit Card or bills from it? Of course not. So if a payment class needs only a CreditCard
object, a CreditCard
shall be given--nothing more, nothing less. No need to create a dependency by passing a huge container object (the Wallet) full of unnecessary objects that creates tight coupling and security risks.
Container objects (usually named with vague suffixes like Container
, Context
, Service
or ServiceLocator
, Portal
, Environment
, etc.) should not be passed as parameters of constructors or methods (unless the whole content is needed).
The innermost level of nested loops, switches and ifs is the measure of how complex is a method. For instance, if inside a foreach loop inside another foreach loop is scored as a complexity of 3. The complexity of a method is the maximum complexity of its code and should never be above 4. If it is more than 4, then it is refactoring time!
As a rule of thumb, every method should fit in a screen without vertical scrolling. If it does not, it is time to refactor. Refactoring long methods will make a developer very appreciative about the absence of global variables.
- Martin Fowler | Refactoring - Improving the Design of Existing Code http://martinfowler.com/books/refactoring.html
- Misko Hevery | Clean Code - Google Tech Talks http://www.youtube.com/watch?v=RlfLCWKxHJ0
- Robert C. Martin | Clean Code: A Handbook of Agile Software Craftsmanship http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
- Dominic Betts, Grigori Melnik, Fernando Simonazzi, Mani Subramanian | Dependency Injection with Unity http://www.microsoft.com/en-us/download/details.aspx?id=39944
- Chad Parry | Do-It-Yourself Dependency Injection http://blacksheep.parry.org/wp-content/uploads/2010/03/DIY-DI.pdf
- Kent Beck | Test Driven Development: By Example http://www.barnesandnoble.com/w/test-driven-development-kent-beck/1111349772?ean=9780321146533
- Erich Gamma, Richard Helm, Richard Helm, John Vlissides | Design Patterns: Elements of Reusable Object-Oriented Software http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612