Performance Quiz #13 -- Linq to SQL compiled query cost -- solution

Well is there really a "solution" at all in general?  This particular case I think I constrained enough that you can claim an answer but does it generalize?  Let's look at what I got first, the raw results are pretty easy to understand.

The experiment I conducted was to run a fixed number of queries (5000 in this case) but to break them up so that the compiled query was reused a decreasing amount.  The first run is the "best" 1 batch of 5000 selects all using the compiled query.  Then 2 batches of 2500, and so on down to 5000 batches of 1.  As a control I also run the uncompiled case at each step expecting of course that it makes no difference.  Note the output indicates we selected a total of 25000 rows of data -- that is 5 per select as expected.  Here are the raw results:

Testing 1 batches of 5000 selects
5000 selects uncompiled 9200.0ms 25000 records total 543.48 selects/sec
5000 selects compiled 5401.0ms 25000 records total 925.75 selects/sec

Testing 2 batches of 2500 selects
5000 selects uncompiled 9181.0ms 25000 records total 544.60 selects/sec
5000 selects compiled 5402.0ms 25000 records total 925.58 selects/sec

Testing 5 batches of 1000 selects
5000 selects uncompiled 9169.0ms 25000 records total 545.32 selects/sec
5000 selects compiled 5432.0ms 25000 records total 920.47 selects/sec

Testing 100 batches of 50 selects
5000 selects uncompiled 9184.0ms 25000 records total 544.43 selects/sec
5000 selects compiled 5511.0ms 25000 records total 907.28 selects/sec

Testing 1000 batches of 5 selects
5000 selects uncompiled 9166.0ms 25000 records total 545.49 selects/sec
5000 selects compiled 6526.0ms 25000 records total 766.17 selects/sec

Testing 2500 batches of 2 selects
5000 selects uncompiled 9165.0ms 25000 records total 545.55 selects/sec
5000 selects compiled 7892.0ms 25000 records total 633.55 selects/sec

Testing 5000 batches of 1 selects
5000 selects uncompiled 9157.0ms 25000 records total 546.03 selects/sec
5000 selects compiled 10825.0ms 25000 records total 461.89 selects/sec

And there you have it.  Even at 2 uses the compiled query still wins but at 1 use it loses.  In fact, the magic number for this particular query is about 1.5 average uses to break even.  But why?  And how might it change?

Well, as has been observed in the comments, Linq query compilation isn't like regular expression compilation.  In fact compiling the query doesn't do anything that isn't going to have to happen anyway.  In fact, actually creating the compiled query with Query.Compile hardly does anything at all, it's all deferred until the query is run just as it would have been had the query not been compiled.  So what is the overhead?  Why is it slower at all?  And what's the point of it?

Well the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime.  The compiled query can live across DataContexts, in fact it could potentially live for the entire life of your program.  And since it has no shared state in it, it's thread-safe and so forth.  It exists to:

1) Give the Linq to SQL system a place to store the results of analyzing the query (i.e. the actual SQL plus the delegate that will be used to extract data from the result set)

2) Allow the user to specify the "variable parts" of the query.  The most common case isn't that the query is exactly the same from run to run, usually it's "nearly" the same... That is it's the same except that perhaps the search string is different in the where clause, or the ID being fetched is different.  The shape is the same.  Creating a delegate with parameters allows you to specify which things are fixed and which things are variable.

Now there was some debate about how to make compiled queries durable, automatically caching them was considered, but this was something I was strongly against.  Largely because of the object lifetime issues it would cause.  First, you would have to do complicated matching of a created query against something that was already in the cache -- something I'd like to avoid.  Secondly you have to decide where to store the cache, if you associate it with the DataContext then you get much less query re-use because you only get a benefit if you run the same query twice in the same data context.  To get the most benefit you want to be able to re-use the query across DataContexts.  But then, do you make the cache global?  If you do you have threading issues accessing it, and you have the terrible problem that you don't know when is a good time to discard items from the cache.  Ultimately this was my strongest point, at the Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as I've written many times before, when it comes to caching good policy is crucial.  In fact, analogously, we had to make changes in the regular expression caching system back in Whidbey precisely because we were seeing cases where our caching assumptions were resulting in catastrophically bad performance (Mid Life Crisis due to retained compiled regular expressions in our cache) --  I didn't want to make that mistake again.

So that's roughly how we end up at our final design.  Any Linq to SQL user can choose how much or how little caching is done.  They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs.  Usually the simple choice is adequate.  And they can easily choose which queries to compile and which to just run in the usual manner.

Let's get back to the overhead of compiled queries.  Besides the one-time cost of creating the delegate there is also an little extra delegate indirection on each run of the query plus the more complicated thing we have to do: since the compiled query can span DataContexts we have to make sure that the DataContext we are being given in any particular execution of a compiled query is compatible with the DataContext that was provided when the query was compiled the first time.

Other than that the code path is basically the same, which means you come out ahead pretty quickly.  This test case was, as usual, designed to magnify the typical overheads so we can observe them.  The result set is a small number of rows, it is always the same rows, the database is local, and the query itself is a simple one.  All the usual costs of doing a query have been minimized.  In the wild you would expect the query to be more complicated, the database to be remote, the actual data returned to be larger and not always the same data.  This of course both reduces the benefit of compilation in the first place but also, as a consolation prize, reduces the marginal overhead.

In short, if you expect to reuse the query at all, there is no performance related reason not to compile it.

Comments

  • Anonymous
    January 14, 2008
    PingBack from http://geeklectures.info/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution/

  • Anonymous
    January 14, 2008
    I, for one, would be interested in seeing the code associated with this. -- Thanks

  • Anonymous
    January 14, 2008
    I think the results that others have posted in the previous post and their differences shows that it depends on the query, the database and how you implement it.  Whether or not to use (or not use) a compiled query should be based on observed metrics.

  • Anonymous
    January 14, 2008
    It's just two nested loops around the queries I gave in the problem statement.  The timer goes around the whole thing.  Couldn't be any dumber. start timer for (j=0;j<batches;j++)   compile query   for (i=0;i<runs;i++)     create new data context     run query in new context stop timer print number of selects and time etc. batches*runs always = 5000 In the uncompiled case it really doesn't matter how many in each batch because the same code runs the same number of times plus or minus some loop overhead. start timer for (j=0;j<batches;j++)   for (i=0;i<runs;i++)     create new data context     run query in new context stop timer print number of selects and time etc.

  • Anonymous
    January 14, 2008
    Peter writes: "I think the results that others have posted in the previous post and their differences shows that it depends on the query, the database and how you implement it.  Whether or not to use (or not use) a compiled query should be based on observed metrics." Well you'll never get me to disagree with "you should make your choice based on observed metrics."   I think what you'll find is that you break even very easily so you'll end up making your choice on the basis of:  

  1. is there any re-use at all
  2. how much uglier is the compiled code pattern and is the benefit worth the hassle The more complicated the query, the greater the benefits of not having to re-analyze it.   On the other hand, slower connections and/or more data processing will tend to make all the Linq overheads smaller by proportion, to the point where it could all be moot. But the point here is to illustrate that, unlike other compilation methods, this one has very modest overhead because the compiled path is nearly the same as the uncompiled path.  There isn't a seperate "compilation" vs. "interpretation" process like in say regular expressions. Or, said yet another way, you actually always compile the query.  The only question is, do you save the results of compilation or not?
  • Anonymous
    January 14, 2008
    Umm, what does the "In fact, the magic number for this particular query is about 1.5 average uses to break even." tell me? Isn't that discrete, you can either run it once or twice. How do I run this query 1.5 times? The 1.5 average is pretty useless in this case.

  • Anonymous
    January 14, 2008
    It tells you that when your code is executed, if the average number of times a particular query is run is >1.5, then you should consider using compiled queries.

  • Anonymous
    January 14, 2008
    The comment has been removed

  • Anonymous
    January 14, 2008
    > the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime If there was some mechanism to reference anonymous types like this: Expression<Func<?>> expr = from o in nw.Orders select new { OrderID = o.OrderID, CustomerID = o.CustomerID, EmployeeID = o.EmployeeID, ShippedDate = o.ShippedDate }; there would be no need for compilation you think? And what is the difference between CompiledQuery.Compile and the following: public static Func<TArg0, TResult> MyCompile<TArg0, TResult>(Expression<Func<TArg0, TResult>> query) where TArg0 : DataContext { return(query.Compile()); } var myQuery = MyCompile((Northwind nwDb) => (from o in nwDb.Orders select new { OrderID_C = o.OrderID, CustomerID = o.CustomerID, EmployeeID = o.EmployeeID, ShippedDate = o.ShippedDate })); I suppose where TArg0 : DataContext restriction might be very important! It allows to introspect TArg0 and emit direct code to manipulate it!

  • Anonymous
    January 14, 2008
    LINQ query versus compiled LINQ query

  • Anonymous
    January 14, 2008
    LINQ query versus compiled LINQ query

  • Anonymous
    January 14, 2008
    >Any Linq to SQL user can choose how much or how little caching is done.  They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs. Rico, can you give us an example of how to control the lifetime of this particular compiled query? The resulting compiled query is an anonymous type. This generally limits its usage to the scope of the method that created it. It seems that the caveat is that there's a lot less flexibility in controlling the lifetime of a compiled query based on anonymous types.

  • Anonymous
    January 15, 2008
    You have to use the more formal syntax and store the thing in a static or something like that.  Or store it as a member of some other helper type with suitable lifetime. In this case I was able to use the simpler 'var' syntax because I had a synthetic lifetime. Normally you don't get such a luxury. I think you could stuff it in a local, using var, then use intellisense to see the exact type, and make something of that type for instance.

  • Anonymous
    February 07, 2008
    The comment has been removed

  • Anonymous
    May 04, 2008
    10 Tips to Improve your LINQ to SQL Application Performance

  • Anonymous
    October 27, 2008
    If you are using Linq to SQL, instead of writing regular Linq Queries, you should be using Compiled Queries

  • Anonymous
    October 27, 2008
    If you are using Linq to SQL, instead of writing regular Linq Queries, you should be using Compiled Queries

  • Anonymous
    November 10, 2008
    Heythere,backagain.InmyfirstpostaboutLINQItriedtoprovideabrief(okay,bitdetailed)in...