Transparent Lazy Loading for Entity Framework – part 1

This post is a part of the series that describes EFLazyLoading library.

The first release of Entity Framework supports explicit loading. This means that if you are navigating a relationship, you have to make sure it is loaded by explicitly calling Load()on EntityReference<T> or EntityCollection<T> objects, or pre-loading your relationships by using Include() on your query.

If you try to navigate a many-to-one or one-to-one relationship that is not loaded, you will get a NullReferenceException. In case of EntityCollection<T> that has not been loaded you will silently get an empty collection which may lead to subtle bugs.

One of the benefits of explicit loading is that you can easily locate all places in your code that cause database round-trips. Unfortunately general-purpose code that can be used in multiple units of work (such as validation, permission checks, etc.) does not typically know which relationships have been loaded. Because of that it always has to check whether the relationship being navigated has been loaded, and call Load() if not.

As you can imagine this can easily lead to code that is cluttered with IsLoaded/Load():

 var prod = entities.Products.First();
prod.SupplierReference.Load();
var supplier = prod.Supplier;
prod.OrderDetails.Load();
foreach (OrderDetail det in prod.OrderDetails)
{
    if (!det.OrderReference.IsLoaded)
        det.OrderReference.Load();
    Console.WriteLine("{0} {1}", det.Product.ProductName, det.Order.OrderDate);
}

Transparent lazy loading is a way to make your business logic code more readable by handling loads under the hood. As a result you get an illusion of a fully populated object graph, so the above example can be re-written as:

 var prod = entities.Products.First();
var supplier = prod.Supplier;
foreach (OrderDetail det in prod.OrderDetails)
{
    Console.WriteLine("{0} {1}", det.Product.ProductName, det.Order.OrderDate);
}

This simplicity comes at a cost:

- Database queries are more difficult to locate (potentially any relationship navigation can lead to a query)

- Object graph is fully populated so you cannot easily serialize parts of it without using DTO (Data Transfer Objects). Carelessly returning an object from a web service could potentially bring in the entire database with it.

As we said, Entity Framework v1 supports explicit loading only, but the object layer code is something the developer can control, either by writing it by hand or creating a tool to do so. We just need to inject Load() method call in a few places. Sounds simple?

Strategies for implementing transparent lazy loading

There are two main strategies when implementing transparent lazy loading. One approach is to fully materialize related objects whenever you access them – let’s call this approach Lazy Initialization.

Lazy Initialization is easy do in Entity Framework – all you have to do is to add extra code to do Load() in property getters that are used to navigate relationships (see Danny’s post about codegen events).

The following code checks whether the relationship has been loaded and forces Load() if it has not – this frees the business logic to focus on business rules rather than plumbing (note that this source code change only works with attached objects – detached objects require special handling – not shown here):

 [EdmRelationshipNavigationProperty("NorthwindEFModel", "Products_Supplier", "Supplier")]
[XmlIgnore]
[SoapIgnore]
[DataMember]
public Supplier Supplier
{
    get
    {
        // added code 
        if (!SupplierReference.IsLoaded)
            SupplierReference.Load();
        return ((IEntityWithRelationships)(this)).RelationshipManager.
            GetRelatedReference<Supplier>("NorthwindEFModel.Products_Supplier", "Supplier").Value;
    }
    set
    {
        ((IEntityWithRelationships)(this)).RelationshipManager.
            GetRelatedReference<Supplier>("NorthwindEFModel.Products_Supplier", "Supplier").Value = value;
    }
}

The result is that product.Supplier is always accessible, which is what we wanted. Unfortunately fully materializing related objects is not always desirable for performance reasons. There are cases where you do not care about related object attributes, but the object itself is interesting to you. Consider an example function ShareManager that returns true when two employees share the same manager and false otherwise:

 bool ShareManager(Employee emp1, Employee emp2)
{
    if (emp1.Manager == emp2.Manager)
        return true;
    else
        return false;
}

By merely touching emp1.Manager and emp2.Manager, we have potentially caused two Manager entities to materialize (and that means two database queries), while we were just interested in checking whether they are the same object.

In Entity Framework you can reason about identities of related objects without materializing them by examining EntityKey property on EntityReference<T>. So our example can be re-written for performance as:

 bool ShareManager(Employee emp1, Employee emp2)
{
    if (emp1.ManagerReference.EntityKey == emp2.ManagerReference.EntityKey)
        return true;
    else
        return false;
}

 

But that is not nearly as nice as the first code snippet because you have to deal with EntityKeys now.

Fortunately it turns out that with some clever code generation it is possible to have the first syntax and not pay the price for object materialization except when it is absolutely needed. Intrigued? Stay tuned for Part 2 where I will introduce a lazy loading framework (code generator and supporting library) for EF.

The strategy that will be used is based on an observation that you do not need to materialize an object if you do not access its non-key properties…

Updates:

The code for EFLazyLoading library can be downloaded from https://code.msdn.microsoft.com/EFLazyLoading

The second part of this article is available here