November 2010
Volume 25 Number 11
Data Points - Using the Entity Framework to Reduce Network Latency to SQL Azure
By Julie Lerman | November 2010
At first glance, switching from a locally managed SQL Server database to the Microsoft cloud-based SQL Azure database sounds like it could be difficult. But in reality, it’s a snap: Simply switch the connection string and you’re done! As we developers love to see in these situations, it “just works.”
However, making the switch introduces network latency, which can substantially affect overall application performance. Fortunately, a good understanding of the effects of network latency leaves you in a powerful position to use the Entity Framework to reduce that impact in the context of SQL Azure.
Profiling Your Data Access Code
I use Visual Studio 2010 profiler tools (msdn.microsoft.com/library/ms182372) to compare different data access activities against the AdventureWorksDW database residing on both my local network and my SQL Azure account. I use the profiler to investigate calls to load some customers from the database using the Entity Framework. One test queries only the customers and then retrieves their sales information after the fact using the lazy loading feature of the Entity Framework 4.0. A later test eager loads customer sales along with the customers using the Include method. Figure 1 shows the console app that I used to execute these queries and enumerate the results.
Figure 1 Queries Designed to Explore Performance
{
using (var context = new AdventureWorksDWEntities())
{
var warmupquery = context.Accounts.FirstOrDefault();
}
DoTheRealQuery();
}
private static void DoTheRealQuery()
{
using ( var context=new AdventureWorksDWEntities())
{
var query = context.Customers.Where(c => c.InternetSales.Any()).Take(100);
var customers = query.ToList();
EnumerateCustomers(customers);
}
}
private static void EnumerateCustomers(List<Customer> customers)
{
foreach (var c in customers)
{
WriteCustomers (c);
}
}
private static void WriteCustomer(Customer c)
{
Console.WriteLine
("CustomerName: {0} First Purchase: {1} # Orders: {2}",
c.FirstName.Trim() + "" + c.LastName, c.DateFirstPurchase, c.InternetSales.Count);
}
I begin with a warm-up query in order to get past the expense of loading the Entity Data Model (EDM) metadata into memory, pre-compiling views and other one-time operations. The DoTheRealQuery method then queries a subset of Customers entities, executes the query into a list and enumerates the results. During the enumeration, the customer’s sales are accessed, forcing—in this case—a lazy load that goes back to the database to get the sales for each customer throughout the iteration.
Looking at the Performance on a Local Network
When running this on my local network against an on-premises SQL Server database, the first call to execute the query takes 233 ms. That’s because I’m only retrieving the customers. When the code runs the enumeration that forces the lazy load, it takes 195 ms.
Now I’ll change the query so that it eager loads the InternetSales along with Customers:
context.Customers.Include("InternetSales").Where(c => c.InternetSales.Any()).Take(100);
Now those 100 customers, along with all of their sales records, will be returned from the database. That’s a lot more data.
The query.ToList call now takes about 350 ms, nearly 33 percent longer than returning just the 100 customers.
There’s another effect of this change: As the code enumerates through the customers, the sales data is already there in memory. That means the Entity Framework doesn’t have to make an extra 100 round-trips to the database. The enumeration, along with writing out the details, takes only about 70 percent of the time it took when lazy loading was doing its job.
Taking into account the amount of data, the computer and local network speed, overall, the lazy-loading route in this particular scenario is a little faster than when you eager load the data. However, it’s still fast enough that the difference isn’t even noticeable between the two. They both appear to run blazingly fast thanks to the Entity Framework.
Figure 2 shows a comparison of eager and lazy loading in the local network environment. The ToList column measures the query execution, which is the code line: var customers = query.ToList();. The Enumeration measures the EnumerateCustomers method. And finally, the Query & Enumeration column measures the complete DoTheRealQuery method, which combines the execution, enumeration, instantiation of the context and declaration of the query itself.
Figure 2 Comparing Eager Loading to Lazy Loading from a Local Database
Switching to the SQL Azure Database
Now I’ll switch the connection string to my SQL Azure database in the cloud. It shouldn’t be surprising that network latency between my computer and the cloud database kicks in, making queries take longer than against the local database. You can’t avoid latency in this scenario. However, what is notable is that the increase isn’t equal among the various tasks. For some types of requests, the latency is much more exaggerated than others. Take a look at Figure 3.
Figure 3 Comparing Eager Loading to Lazy Loading from SQL Azure
Eager loading the graphs is still slower than only loading the customers up front. But where it was about 30 percent slower locally, on SQL Azure it now takes about three times as long as lazy loading.
But as you can see in Figure 3, it’s the lazy loading that’s most impacted by the network latency. After the InternetSales data was in memory thanks to the eager loading, enumerating over the data is as quick as the first set of tests. But the lazy loading is causing 100 additional round-trips to the cloud database. Because of the latency, each of these trips takes longer—and in combination, the resulting time to retrieve the results is visibly noticeable.
The enumeration takes more time than the in-memory enumeration by orders of magnitude. Each trip to the database to get a single customer’s InternetSales data takes a significant amount of time. Overall—even though it’s surely much faster to load only the Customers up front—in this environment it took almost six times longer to retrieve all of the data with lazy loading.
The point of all this isn’t to incriminate SQL Azure, which in fact is highly performant, but to point out that your choice of Entity Framework query mechanism can have a negative impact on overall performance because of the latency issues.
The particular use case of this demo isn’t typical for an application, because, typically, you wouldn’t lazy load related data for a large series of objects. But it does highlight that, in this case, returning a lot of data all at once (via eager loading) is much more efficient than returning that same amount of data across a multitude of trips to the database. When you’re using a local database, the difference may not be as significant as it is when your data is in the cloud, so it merits a close look when you switch from your local database to SQL Azure.
Depending on the shape of the data that you’re returning—perhaps a much more complex query with numerous Includes, for example—there are times when eager loading is more expensive and times when lazy loading is more expensive. There are even times when it will make sense to distribute the load: eager load some data and lazy load others based on what you learn from performance profiling. And in many cases, the ultimate solution is to move your application to the cloud, as well. Azure and SQL Azure were designed to work as a team. By moving your application to Azure and having that application get its data from SQL Azure, you’ll maximize performance.
Use Projections to Fine-Tune Query Results
When using applications that are running locally, in some scenarios, you may be able to revise the queries to further refine the amount of data returned from the database. One technique to consider is using projections, which grant you much more control over what related data is brought back. Neither eager loading nor deferred/lazy loading in the Entity Framework allows you to filter or sort the related data being returned. But with a projection, you can.
For example, the query in this modified version of TheRealQuery method returns only a subset of InternetSales entities—those greater than or equal to 1,000:
private static void TheRealQuery()
{
using ( var context=new AdventureWorksDWEntities())
{
Decimal salesMinimum = 1000;
var query =
from customer in context.Customers.Where(c =>
c.InternetSales.Any()).Take(100)
select new { customer, Sales = customer.InternetSales };
IEnumerable customers = query.ToList();
context.ContextOptions.LazyLoadingEnabled = false;
EnumerateCustomers(customers);
}
}
The query brings back the same 100 customers as the previous query, along with a total of 155 InternetSales records compared to the 661 sales records that are returned without the SalesAmount filter.
Mind this important note about projections and lazy loading: When projecting the data, the context doesn’t recognize the related data as having been loaded. That only happens when it’s loaded via the Include method, the Load method or lazy loading. Therefore, it’s important to disable lazy loading prior to the enumeration as I’ve done in the TheRealQuery method. Otherwise, the context will lazy load the InternetSales data even though it’s already in memory, causing the enumeration to take much longer than necessary.
The modified enumeration method takes this into account:
private static void EnumerateCustomers(IEnumerable customers)
{
foreach (var item in customers)
{
dynamic dynamicItem=item;
WriteCustomer((Customer)dynamicItem.customer);
}
}
The method also takes advantage of the dynamic type in C# 4 to perform late binding.
Figure 4 demonstrates the significant performance gain realized by the more finely tuned projection.
Figure 4 Comparing Eager Loading to a Filtered Projection from SQL Azure
It may seem obvious that the filtered query would be faster than the eager-loaded query that returns more data. The more interesting comparison is the fact that the SQL Azure database processes the filtered projection about 70 percent faster, while on a local database the filtered projection is only about 15 percent faster than the eager-loaded query. I suspect that the eager-loaded InternetSales collection causes the in-memory enumeration to be faster because of the way the Entity Framework accesses it internally compared to the projected collection. But as the enumeration in this case is occurring completely in memory, it has no bearing on the network latency. Overall, the improvement seen with projection outweighs the small enumeration price when using the projection.
On your network, switching to a projection to fine-tune the query results may not seem necessary, but against SQL Azure, this type of tuning can realize significant performance gains in your application.
All Aboard the Cloud
The scenarios I’ve discussed revolve around locally hosted apps or services that use SQL Azure. You may instead have your application or service hosted in the cloud in Azure, as well. For example, end users may be using Silverlight to hit an Azure Web Role running Windows Communication Foundation that in turn accesses data in SQL Azure. In this case, you’ll have no network latency between the cloud-based service and SQL Azure.
Whatever the combination, the most important point to remember is that even though your application continues to function as expected, performance can be affected by network latency.
Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other Microsoft .NET topics at user groups and conferences around the world. Lerman blogs at thedatafarm.com/blog and is the author of the highly acclaimed book, “Programming Entity Framework” (O’Reilly Media, 2010). Follow her on Twitter.com: julielerman.
Thanks to the following technical experts for reviewing this article: Wayne Berry, Kraig Brockschmidt, Tim Laverty and Steve Yi