June 2009
Volume 24 Number 06
Entity Framework - Anti-Patterns To Avoid In N-Tier Applications
By Daniel Simmons | June 2009
This article discusses:
|
This article uses the following technologies: Entity Framework |
Contents
Understanding N-tier
Anti-Pattern #1: Tight Coupling
Anti-Pattern #2: Assuming Static Requirements
Anti-Pattern #3: Mishandled Concurrency
Anti-Pattern #4: Stateful Services
Anti-Pattern #5: Two Tiers Pretending to be Three
Anti-Pattern #6: Undervaluing Simplicity
As a member of the Entity Framework team, I frequently talk to customers about building applications that use the Entity Framework. Probably the topic I get asked about more than anything else is designing n-tier applications. In this article, I will try to set a foundation on which you can build for success in this part of your applications. The majority of the article is devoted to design anti-patterns for n-tier, which are usually the most important issues that I find. This is a topic where there are a lot of options and many issues to consider, so it is important to understand the overall space before making decisions for your particular application. In future articles, I will examine n-tier patterns for success and some of the key APIs and issues specific to the Entity Framework, and provide a sneak peak at features coming in the Microsoft .NET Framework 4 that should make n-tier significantly easier.
Understanding N-tier
Before I dive into the anti-patterns, it is important to have a common understanding of n-tier.
The first point to be clear on is the difference between tiers and layers. A well-designed application will have multiple layers with carefully managed dependencies. Those layers could live in a single tier or be split across multiple tiers. A layer is just an organizational concept in an application, while a tier denotes physical separation or at least a design that will allow physical separation if needed.
Any application that talks to a database has more than one tier unless that database runs in-process, but the application is not called n-tier unless it involves more tiers than just the database and application. Similarly, every ASP.NET application that involves a database is technically n-tier because there is the database, the Web server, and the browser. Unless you introduce Windows Communication Foundation (WCF) or Web services, you would not call that application n-tier since for most purposes the Web server and browser can be thought of as a single client tier. N-tier applications are those that have at a minimum a database tier, a middle tier that exposes a service, and a client tier.
While it does not sound like that big a deal on the surface, it turns out that implementing applications split across multiple tiers is difficult. There are a lot more pitfalls than you think. These pitfalls led Martin Fowler, in his book Patterns of Enterprise Application Architecture (Addison-Wesley, 2002), to make a very strong statement on the subject:
Don't distribute your objects!
Martin calls it the First Law of Distributed Objects.
As with every design rule, though, there are times when the law must be set aside. Your application may have a scalability problem that requires multiple tiers so you can apply more computing resources. Maybe you need to exchange data with a business partner or customer application. It may just be that you have security or infrastructure constraints that divide your application onto multiple computers or prevent one part of your application from talking directly to another part. When you need multiple tiers, you really need them.
While the anti-patterns presented in this article can be applied to a wide range of applications and technologies, the main focus will be creating and consuming custom WCF services that persist data using the Entity Framework.
Not surprisingly, many n-tier anti-patterns are a result of losing focus on the goal of your application. If you do not keep in mind what motivated you to use an n-tier architecture in the first place, or if you neglect critical persistence concerns, then it is all too easy to get in trouble. The following sections will look at some common problems.
Custom Service or RESTful Service?
REST, or Representational State Transfer, is a type of Web service that is rapidly gaining in popularity. So you might ask yourself what the difference is between RESTful services and custom Web services, and why you might choose one type over the other. The key difference between the two types is that REST services are resource-centric while custom services are operation-centric. With REST, you divide your data into resources, give each resource a URL, and implement standard operations on those resources that allow creation, retrieval, update, and deletion (CRUD). With custom services, you can implement any arbitrary method, which means that the focus is on the operations rather than the resources, and those operations can be tailored to the specific needs of your application.
Some services fit very naturally into the REST model—usually when the resources are obvious and much of the service involves management of those resources. Exchange Server, for instance, has a REST API for organizing e-mail and calendar items. Similarly, there are photo-sharing Web sites on the Internet that expose REST APIs. In other cases, the services less clearly match REST operations, but can still be made to fit. Sending e-mail, for example, can be accomplished by adding a resource to an outbox folder. This is not the way you would most naturally think about sending e-mail, but it is not too much of a stretch.
In other cases, though, REST operations just do not fit well. Creating a resource in order to initiate a workflow that drives monthly payroll check printing, for example, would be much less natural than having a specific method for that purpose.
If you can fit your service into the constraints of REST, doing so will buy you a lot of advantages. ADO.NET Data Services in combination with the Entity Framework makes it easy to create both RESTful services and clients to work with them. The framework can provide more functionality to RESTful services automatically because the services are constrained to follow a specific pattern. In addition, RESTful services have a very broad reach because they are so simple and interoperable. They work especially well when you do not know in advance who the clients might be. Finally, REST can be made to scale to handle very large volumes of operations.
For many applications, the constraints of REST are just too much. Sometimes the domain does not divide clearly into a single pattern of resources or the operations involve multiple resources at once. Sometimes the user actions and business logic around them do not map well to RESTful operations, or more precise control is required than can fit into those operations. In these cases, custom services are the way to go.
You can always build an application that has a mix of REST and custom services. Often the ideal solution for an application is a mixture of both.
Anti-Pattern #1: Tight Coupling
Chances are you have heard about the evils of tight coupling. So you always strive to keep your components as loosely coupled as possible, right? Yeah, right.
Loose coupling is more difficult than tight coupling, and often the performance is not as good. You start off with the best of intentions, but end up asking if the benefit is worth the cost. Why introduce an interface and dependency injection when you could just create an instance of the class and call the method directly? Why build an abstraction with custom objects mapped to the database instead of filling a DataTable and passing it around?
To make matters worse, you do not usually feel the pain of tight coupling until much later. In the short term, you gain some efficiency and get the job done, but in the long run evolving the application can become almost impossible.
If you have been building software for any time at all, you probably understand coupling tradeoffs fairly well when it comes to modules within a tier. When you have modules that work together closely, sometimes tight coupling is the right choice, but in other cases, components need to be kept at arm's length from one another so that you can contain the ripple effect of changes to the application.
When it comes to splitting parts of your application into separate tiers, the significance of coupling becomes much greater. The reason for this is simple. Tiers do not always change at the same rate. If you have a service that is consumed by many clients and you cannot guarantee all the clients will upgrade on demand, then you better make sure you can change that service without having to change the clients. If not, you will encounter a problem that is sometimes called shearing rates of change. Imagine your application being pulled in two different directions until it is forcefully ripped apart.
The trick is to identify which parts of the application might have different rates of change and which parts are tightly coupled to each other. First, consider the boundary between the database and the mid-tier. As your application grows, there is a good chance you will need to adjust the database to improve performance, and if you ever share the database between multiple applications, there is a very good chance you will want to evolve one application's mid-tier without changing the other one. Fortunately, using the Entity Framework already helps here because its mapping system provides an abstraction between your mid-tier code and the database. The same questions should be considered between the mid-tier and the client.
A particularly common and painful example of this anti-pattern in action is an architecture that uses table adapters to retrieve data from the database and Web services that exchange DataSets with the client. The table adapter moves the data into a DataSet with the same schema (thus tightly coupling the database to the mid-tier) and then the Web service exchanges that same DataSet with the client (thus tightly coupling the mid-tier to the client). This kind of system is easy to create—there are Visual Studio tools that lead you through the process nicely. But if you build a system that way, changes to any part of the system are likely to ripple to all other parts.
Anti-Pattern #2: Assuming Static Requirements
Speaking of changes to the system, sometimes you design around an assumption that requirements will remain static, but there are two cases where changing requirements have an especially significant impact. One comes from treating the client as trusted, and the other occurs when the mid-tier service assumes the client will be implemented using a particular technology.
While it is unlikely that trust boundaries will change unexpectedly, when it comes to data integrity, security, and trust, the consequences of getting it wrong are just too great. If you perform validation only on the client, for instance, and on the mid-tier trust that the data you receive is OK to send directly to the database without revalidating, the chance that something will eventually go wrong is much larger than you might think. Even knowing the service only runs within your intranet is not enough to keep your information safe. Someone might create another client using the same service or modify the first client to call the service from a different code path that skips validation. Who knows what might happen.
Further, once you have a service, it is more likely that regular code to be used in ways that you did not anticipate than—so much so that the generally accepted wisdom is that you should always validate and enforce some degree of security on the mid-tier even though that may mean validating or performing access control more than once.
The second issue, locking the client into a particular technology, is even more likely to be a problem. Technologies always change. If an application survives long enough, something will happen that forces technology adjustments, and clients are especially susceptible. You may initially design your application as a rich client desktop application and then later find you need to move it to a mobile phone or Silverlight. If that were the case, and you designed your service to exchange DataSets, then major surgery would be needed for the service and all existing clients.
Anti-Pattern #3: Mishandled Concurrency
While there is a tight coupling downside to exchanging DataSets, concurrency is a complex-but-important area that the DataSet handles well. Unfortunately many developers do not understand the nuances of managing concurrency, and to make things worse, a mistake with concurrency is the kind of problem that often only shows up once the application is in production. If you are lucky, it will manifest as an obvious failure. If not, it may cause corruption to your data over a long period of time without being detected.
At its core, concurrency management is fairly simple: guarantee data integrity even if two clients try to modify the same data at roughly the same time. Particularly attentive readers will note that these problems also come up in cases that are unrelated to n-tier, but concurrency issues are particularly relevant to the Entity Framework n-tier designs, because the Entity Framework's handling of n-tier scenarios creates unique concurrency challenges.
For most applications, the concurrency management technique of choice is optimistic concurrency. Even though many clients may access the database simultaneously, the number of times when the exact same entity is modified in conflicting ways is quite small. So you assume everything will work out, but take steps to detect if something goes wrong.
Detection is driven by one or more properties, collectively called the concurrency token, that change whenever any part of the entity changes. When the application reads an entity, it saves the value of the concurrency token. Later, when it wants to write that entity back to the database, it first checks to make sure that the value of the concurrency token in the database is the same now as it was when the entity was originally read. If it is, the update proceeds. If not, the update halts and throws an exception.
The Entity Framework supports optimistic concurrency by transparently tracking the original value of concurrency tokens when entities are queried and checking for conflicts prior to database updates. The problem with n-tier applications is that this process works transparently only as long as a single ObjectContext instance is used to track the entity from the time it is queried until the time SaveChanges is called. If you serialize entities from one tier to another, the recommended pattern is to keep the context around on the mid-tier only long enough for a single service method call. Subsequent calls will spin up a new instance of the context to complete each task. (Creating a new context instance for every service operation is an important recommendation in its own right, by the way. For more information, see Anti-Pattern #4: Stateful Services.)
Once developers begin to learn how the Entity Framework APIs work for this kind of disconnected operation—disconnected in the sense that the entities are disconnected from the context after the query, sent to another tier, and then re-connected when it is time to save—there is a tendency to fall into a nasty pattern:
- Query the entity and serialize it to the client. At this point, the concurrency token's current value is the same as the original value, and that is the only value sent to the client.
- The client receives the entity, makes changes, and sends a modified version of the entity back to the mid-tier.
- Since neither the client nor the service explicitly kept track of the concurrency token or what properties have been modified, the service queries the database to get the current state of the entity into a newly created context, then compares values between the current entity from the database and the one sent back from the client.
- The service calls SaveChanges, which performs optimistic concurrency checks while persisting the changes.
Did you see the problem? Actually there are two problems.
First, every time an entity is updated, it has to be read from the database twice—once when it is first queried and a second time right before the update—which creates a significant extra load on the system.
Second, and more importantly, the "original value" used by the Entity Framework to check if the entity has been modified in the database comes from the second query instead of the first one. That is, it comes from the query that happens right before the update. So the result is that the optimistic concurrency check made by the Entity Framework will almost never fail. If someone else modifies the entity between the first query and the second one, the system will not detect the conflict because the value used for the concurrency check is from after the other client's modification instead of before it.
There is still a small window (between the second query and the update) when the optimistic concurrency check could detect a problem, so you still have to write your program to handle the exception, but you will not really have protected your data from corruption.
The correct pattern is either to make a copy of the entity on the client and send back both the original version unmodified and the modified version or to write the client in such a way that it does not modify the concurrency token. If the concurrency token is updated by a server trigger or automatically because it is a row version number (probably the best plan anyway), then there is no reason to modify it on the client. The current value of the property can be left untouched and used as storage for the original value.
This is a reasonably sound approach because if a bug in the client causes the value to accidentally be modified, it is highly unlikely that you will get a false success. That bug might cause you to get a false failure, but that is much more acceptable than false success.
To make this approach work, when the mid-tier receives the entity from client, you need to attach it to the context and then go over its properties, manually marking them as modified. In either case, though, you will fix both of the problems with the anti-pattern at once. You will no longer query the database twice, and the concurrency check will be based on the correct value of the token (from the initial query) rather than some later value.
Anti-Pattern #4: Stateful Services
Given the comparative ease of developing client-server solutions, the next anti-pattern comes up when developers try to simplify things by keeping the context around across multiple service calls. This seems nice at first because it sidesteps the concurrency problems. If you keep the context alive on the mid-tier, then it will contain the correct original entity values. When you receive an entity back from the client, you can compare the updated entity with the version of the entity in the context and apply changes as appropriate. When you save the entity, the correct concurrency check will be made and no extra database query is required.
While this approach seems easy on the surface, there are a number of problems lurking. Managing the context lifetime can get tricky quickly. When you have multiple clients calling the services, you have to maintain a separate context for each client or risk collisions between them. And even if you solve those issues, you will end up with major scalability problems.
These scalability problems are not only the result of tying up server resources for every client. In addition you will have to guard against the possibility that a client might start a unit of work, but never complete it, by creating an expiration scheme. Further, if you decide that you need to scale your solution out by introducing a farm with multiple mid-tier server, then you will have to maintain session affinity to keep a client associated with the same server where the unit of work began.
A lot of effort and specialized technology has been expended on addressing these issues when, in fact, the best solution is to avoid them altogether by keeping your mid-tier service implementations stateless. Each time a service call is made, the mid-tier should spin up the necessary resources, handle the call, and then release all resources specific to that call. If some information needs to be maintained for a unit of work that extends across multiple service calls, then that information should be maintained by the client rather than the mid-tier so there is no session affinity, no need to expire unfinished units of work, and no server resources in use for a particular client in between service calls.
Anti-Pattern #5: Two Tiers Pretending to be Three
Another anti-pattern I encounter fairly often is also an attempt to simplify this process. Usually it shows up as a request something like, "Why can't you make the Entity Framework serialize queries across tiers?" followed almost immediately by, "Oh, and while you are at it, can you support initiating updates from another tier as well?"
These are probably features Microsoft could add to the Entity Framework, but if you stop and think about them for a minute in light of the other issues I have discussed, you would have to question whether this is a good idea.
If you could create an Entity Framework ObjectContext on the client tier, execute any Entity Framework query to load entities into that context, modify those entities, and then have SaveChanges push an update from the client through the mid-tier to the database server—if you could do all that, then why have the mid-tier at all? Why not just expose the database directly?
Remember Fowler's First Law of Distributed Objects. Keep in mind that the only time this kind of architecture makes sense is when you really, really need it. If you really need it, then you need better security or the ability to scale out with multiple servers, or some other thing that I suggest will not really be solved by introducing a mid-tier that is simply a thin proxy for the database. You might use this technique to subvert a restriction placed on you by a corporate policy, but it is hardly capturing the spirit of an n-tier application. My suggestion is to either invest in building an n-tier application to meet a particular need or, if you can get away with it, avoid n-tier altogether.
Anti-Pattern #6: Undervaluing Simplicity
This brings me to the last n-tier anti-pattern. In the name of avoiding all the anti-patterns discussed previously, it is easy to decide that you need to create the most carefully architected, multi-tier, fully separated, re-validating, super design that you can come up with. Then you can spend all your time building infrastructure and none of your time actually delivering value to your users.
It is important to think over your goals and consider whether you are going to need the investment n-tier requires. Simple is good. Sometimes a two-tier app is just the thing. Sometimes you need more tiers than that, but everything is under your control and trusted or you have an AJAX, Silverlight, or click-once client that auto-deploys so that you do not have to worry about shearing rates of change.
If you can make the problem simpler, do so. Put in all the effort for the full solution if you must, but by the same token make sure you put in enough effort to do the job in a way that meets your goals.
Danny Simmons is dev manager for the Entity Framework team at Microsoft. You can read more of his thoughts on the Entity Framework and other subjects at blogs.msdn.com/dsimmons/.