Aracılığıyla paylaş


Too many tiers? Writing efficient multi-tier web applications.

A while ago, we were all convinced that dividing our applications up into multiple tiers was the way to go. This was great because it allowed you to scale up the parts of your application that might represent a bottleneck, and have more control over what resources were allocated to what tier. Windows DNA was built on this concept, as well as the whole “Duwamish” architecture. The problem is, this introduces the possibility of a very “sloppy” implementation of this ideal that creates a much larger bottleneck. This is especially true when using slower mechanisms to transport data across tiers.

Suppose you have a SQL Server backend that pretty much spends its time running stored procedures and returning back datasets. You’ve written a stored proc for anything anyone might want to do. You have a middleware component that connects to your SQL backend, and this provides a programmable API to access your SQL data via web services. This abstracts your actual data store from anyone who wants to program against it, and third parties can simply call into managed APIs to get the data they need. No one needs to run some SQL statement or stored proc that might change in the next version. There are no SQL injection attacks, and other problems with front ends running SQL commands. Life is great.

Then, you have a front end written in ASP.NET. This front end calls into your web services to get the data it needs. The problem comes along when the front end (web application) is too tightly coupled with the business layer (web services). The web application might call several of these web methods during various stages of rendering a page. It might call some APIs to validate the user’s credentials, and then call a web method that returns a dataset to bind to a control, also get some menu data, etc. For each of these web service calls, a stored procedure is run. Now, we pretty much have a 1 to 1 mapping between a web service call and stored procedure call. For large data sets, a high percentage (like 40-50% of the page rendering time is simply spent deserializing these datasets across the wire. This is especially inefficient when the web server is running on the same physical machine as the business layer. ASP.NET will open up a socket, connect to IIS, “POST” in a SOAP envelope, get the results, deserializer this back into a DataSet, then return a reference to this object. While this is happening, the thread serving your web page request is blocked waiting for this result (unless you’re using synchronous web service calls, but that’s a huge challenge to design properly.) Yes, it’s true “remoting” can solve some of these problems, as well as Indigo. If you have more users hitting your site than you have available threads, this can cause all sorts of perf problems. Mean while, until the next web service call comes along, your business layer machine is sitting around twiddling its thumbs.

When this architecture was designed, the ideal situation was the higher tiers (meaning the data layer and business layer) were also the lesser used tiers. The business layer would load in all the data it needed to run, cache it and providing an in-memory representation of the data to work with. At an appropriate time, the data could be written back to the database. The presentation layer would request the data it needed from the business layer and then render it into a form suitable to display to the user. In other words, calling into your business layer wouldn’t necessarily mean a call into your data layer unless new data was needed.

For load balancing reasons, everyone seems to want their middleware components to be stateless and not have to worry about caching data and data being out of sync with the database.

Using proper abstraction means that a single call doesn’t drill down into 80 different layers every time it’s called, otherwise why even have this abstraction? The same works for “abstracting” your application into logical layers. If every call is going to drill down into every single layer every time, why have the layers?

I think a better approach for certain applications, especially those who have very simple business logic (ie: just grab some data from SQL Server) is to write business objects as sort of a “data adapter.” My data adapter knows how to get my data from a SQL connection and provide an API for me to read and manipulate it. My data adapter should be in the form of a DLL that my web application links to. The DLL knows how to connect to the SQL Server that it’s configured to use. Multiple instances of my web server, each with their “data adapters” connecting to a cluster of SQL back-ends, can be running to scale up to the demands of my users. This is still a layer of abstraction, but doesn’t have a massive bottleneck marshalling the data across the wire between tiers. If my front end is super dependent on this data access API, each call should be as lightning fast as I can make it, and that means running it in the same process so I can just return a live pointer to my object.

I could also write a separate web service that also links to my “data adapter” DLL, but provides a SOAP based interface to my API. My application doesn’t need to use this, but third parties that wish to program against my backend can easily use this interface. There’s really no reason for my own application to be accessing my own business layer via SOAP, it’s inefficient and results in blocking threads and dragging down the CPU. The amount of effort it takes to marshal a large chunk of data across SOAP is simply not worth the extra scalability that I’m not convinced I’d really have anyway. In fact, connecting two tightly coupled tiers that are both owned by me and controlled entirely by me should not be talking in an extensible human readable, 7bit protocol – especially if I’m doing this 30 times for every page I serve up.

If you’re building such an application, I’d be interested in hearing your comments as far as architectural decisions and scalability, and if you’ve done any stress testing with large amounts of data, where your bottlenecks are. It’s my opinion that flattening out these tiers into a single process running ASP.NET code and an adapter that can connect to a data backend is the way to go, especially when using SOAP. I’d love to hear your comments!

Mike