Data Points

Entity Framework Q&A

John Papa

Code download available at: DataPoints2008_05.exe (962 KB)
Browse the Code Online

Contents

EntityClient + Entity SQL
Object Services + Entity SQL
Object Services + LINQ
Wrapping Up

Judging by the hundreds of questions I've received on the topic, developers are very interested in the Entity Framework and its data access and modeling implications. While I cannot possibly respond to every one of them, in this month's column I will answer some of the more common questions I have been asked.

Understanding entity modeling, the mapping of entity models to relational databases, and Entity Data Model (EDM) design are the first steps in grasping the Entity Framework. I'll start this Q & A off by answering some questions about the foundations of the Entity Framework, including the ObjectContext, and then discuss when and where it is appropriate to use Entity Client with Entity SQL. I will also explain the differences between EntityClient and Object Services and cover the implications of using LINQ and Entity SQL with these services.

Analysis of the queries created both in code targeting the Microsoft® .NET Framework and in the native SQL code is also an important part of the Entity Framework, so I'll examine explicit loading and eager loading by reviewing the generated SQL. All code examples and the sample NorthwindEF database are available for download with this column from the MSDN® Magazine Web site.

Why use Entity SQL when I can use LINQ to get to entities? Every time I give a presentation about using Entity SQL with EntityClient or Object Services, someone asks this question. (I can't blame them. It's one of the first questions I had when I began diving into the Entity Framework!) The strong typing and query syntax of LINQ is so compelling that developers cannot help but wonder why they need a new language to interact with entities.

To fully answer this question, I first must discuss the three major techniques that you can use to interact with an EDM:

  • Writing Entity SQL queries using the EntityClient provider
  • Writing Entity SQL queries using Object Services
  • Writing LINQ queries using Object Services

Each of these techniques has common characteristics; for example, all use the EntityClient provider either directly or indirectly. But they differ in the results they produce and how they go about getting those results.

The EntityClient provider has a series of objects that should be familiar to you if you know the ADO.NET object model. EntityConnection is used to connect to the EDM, EntityCommand is used to issue a query against the EDM, and the results of the command are returned via DbDataReader. Whether you are using EntityClient directly or indirectly via Object Services, EntityClient ultimately issues the query and returns the results.

So, restating the question, why use Entity SQL when you have LINQ? The answer lies in the pros and cons of each technique.

EntityClient + Entity SQL

Insights: Entity Framework Data Loading

Many object-relational mapping frameworks support two well-known behaviors that help in writing less persistence "plumbing" code and in improving how many times the application hits the database and how much data is transferred:

Eager Loading Each time the program needs to work with a set of related objects, the whole graph is hydrated from the database in a single upfront request.

Lazy Loading Data is loaded from the database automatically as the program navigates the properties that connect the object graph, potentially hitting the store multiple times but loading only what is necessary.

A question that the Data Programmability team at Microsoft often gets (and one I used to ask myself—see forums.microsoft.com/MSDN/ShowPost.aspx?PostID=644979 and forums.microsoft.com/MSDN/ShowPost.aspx?PostID=644808) is why the Entity Framework comes out of the box without automatic lazy loading support. The answer has its roots in the past, but it also has some implications for the future.

Some History

In 2000, when I first learned about the ADO.NET object model (called ADO+ at the time), my impression was that DataSets were very cool, but I could not understand why the connected and the disconnected APIs were so different from each other. I thought I would always miss the "location transparency" of .NET ADO Recordsets before the existence of the Microsoft .NET Framework.

By that time, one of the best ways to learn about .NET was to spend time in the DOTNET discussion list, so I posted my question there (discuss.develop.com/archives/wa.exe?A2=ind0009B&L=DOTNET&D=0&I=-3&P=55148). Omri Gazitt gave me the answer: network roundtrips should not be hidden from the developer. This was a novel idea for me at the time. But, in fact, this is one of the foundational tenets of ADO.NET. No wonder the Entity Framework builds on that!

Back to the Present

Now, eight years later, the Entity Framework Team is readying the first version of a very ambitious data access platform on top of the now very successful ADO.NET framework. Many of the same principles still apply, and, once more, we want it to be a solid foundation for the future.

Following the "no hidden network roundtrips" principle, Entity Framework avoids automatic lazy loading. The sole action of traversing a graph composed of our code-generated classes will never trigger a query against the store. Instead, Entity Framework requires user code to explicitly hydrate any object before use.

It is possible to load as you go, but you need to invoke a method called Load on references or collection every time you want to do that. In fact, by interfering with the code generation process or by writing your own classes, it is still possible to get implicit lazy loading working.

However, there is more to lazy loading than just hidden roundtrips. It is important for you to know that every time your program uses lazy loading, it may easily end up with a graph that contains inconsistent data. Also, if a program keeps loaded things around without refreshing them for a long time, it runs into the risk of having stale data in the graph.

The opposite strategy, eager loading, is fully supported on a query-by-query basis, explicitly using a query builder method named Include that you can use to "span" your query across additional related entities. The parameter for Include is a string containing a dot-separated path of navigation properties.

Include is generally translated to a JOIN query between the source tables, so it will usually cause data redundancy over the wire. Because of this, remember that you should also take great care when you use Include.

—Diego Vega
Program Manager, Microsoft Data Programmability Team

Writing code with the EntityClient API provides you the most granular control of the three techniques. You can create an EntityConnection to connect to the EDM, write a query in Entity SQL and execute it with an EntityCommand, and return the results via a DbDataReader. This technique is also slightly more lightweight, cutting through a little bit of the syntactic sugar that LINQ and Object Services provide.

The greatest advantage of Entity SQL is its flexibility. Its string-based syntax lends itself easily to building dynamic queries. This can be very useful when you need to create an ad hoc query.

One effect of this flexibility and light weight, however, is that you can only return results via the DbDataReader. It is not possible to return pure entities from the EDM using EntityClient with Entity SQL. The DbDataReader can be retrieved and used to iterate through the collection of rows that satisfy the Entity SQL query. The code in Figure 1 allows you to iterate through customer records via a DbDataReader, not Customers entities.

Figure 1 Iterating through Rows Via DbDataReader

string city = "London";
using (EntityConnection cn = new EntityConnection("Name=Entities"))
{
  cn.Open();
  EntityCommand cmd = cn.CreateCommand();
  cmd.CommandText = @"SELECT VALUE c FROM Entities.Customers AS c WHERE 
                     c.Address.City = @city";
  cmd.Parameters.AddWithValue("city", city);
  DbDataReader rdr = cmd.ExecuteReader(CommandBehavior.SequentialAccess);
  while (rdr.Read())
    Console.WriteLine(rdr["CompanyName"].ToString());
  rdr.Close();
}

Here's a good tip: when in debug mode, you may receive an error that states you cannot read from column ordinal X. This error only occurs in debug mode and can be avoided if you close the autos window in debug mode. This is a known issue with the Entity Framework beta 3.

There is currently no comparable Data Manipulation Language (DML) for Entity SQL. This means you cannot directly express Insert, Update, or Delete statements against the EDM (see Figure 2).

Figure 2 Entity Framework APIs

  EntityClient with Entity SQL Object Services with Entity SQL Object Services with LINQ
Direct to EntityClient Provider Yes No No
Good for Ad Hoc Yes Yes No
Can Issue DML Directly No No No
Strongly Typed No No Yes
Can Return Entities as Results No Yes Yes

Object Services + Entity SQL

The next technique is using Object Services to execute queries with Entity SQL. This moves away from direct interaction with the EntityClient provider (although it still communicates with the provider under the covers). You would use ObjectContext and ObjectQuery<T> to issue queries against the EDM.

This technique is good for issuing ad hoc queries, as in the first technique. However, rather than returning data via the DbDataReader, using Object Services with Entity SQL allows you to return entities from the EDM. This gives you a solid combination: flexible queries that return first-class entities.

Since Entity SQL currently lacks any DML constructs, it is not possible to issue Insert, Update, or Delete commands using Entity SQL and Object Services. You could, however, retrieve an entity from the EDM using this technique and then update the entity using the SaveChanges method on the ObjectContext. As you can see, the following code sample iterates through a collection of Customers:

string city = "London";
using (Entities entities = new Entities()) 
{
  ObjectQuery<Customers> query = entities.CreateQuery<Customers>(
    "SELECT VALUE c FROM Customers AS c WHERE c.Address.City = @city",
    new ObjectParameter("city", city)
  );

  foreach (Customers c in query)
    Console.WriteLine(c.CompanyName);
}

Object Services + LINQ

Using Object Services with LINQ is not as good for ad hoc queries as the other techniques. The following code sample returns a collection of Customers from the EDM:

string city = "London";
using (Entities entities = new Entities()) 
{
  var query = from c in entities.Customers
              where c.Address.City == city
              select c;

  foreach (Customers c in query)
    Console.WriteLine(c.CompanyName);
}

Just like Entity SQL, LINQ also does not support direct syntax for DML statements. Currently, it is only possible to update entities against a database when using Object Services (by way of the SaveChanges method). This is done by returning entities from the EDM, whose changes are tracked by the Entity Framework. In short, neither LINQ nor Entity SQL perform update operations; it's the ObjectContext of the EDM that does this.

I summarized the differences in these techniques in Figure 2. So why would you want to use Entity SQL when you already have LINQ? Entity SQL is a solid choice when you have a need for ad hoc queries or want to create a more flexible query than what can be done with LINQ. Otherwise, I suggest using LINQ with Object Services so you can benefit from its strong typing and ability to return entities and projections.

Along with the strongly typed syntax of LINQ, you also get design time views of many errors that you would not otherwise catch until running the application. I really appreciate this feature—it allows me to keep writing code instead of having to build and run to catch errors.

What is the role of ObjectContext? ObjectContext is the gateway to the EntityConnection for Object Services. It provides access to the EDM via the underlying EntityConnection. For example, you can access the entities via the ObjectContext, interrogate ObjectContext to find out information about the state of objects, and create ObjectQuery<T> queries using the CreateQuery method.

Another purpose of ObjectContext is to provide a way for objects to get update information about database entries. For instance, you can use the methods of ObjectContext to add entities to the ObjectContext, delete entities, manipulate entities, and ultimately save the changes of the entities to the database (via the SaveChanges method).

How do explicit and eager loading work in the Entity Framework? Explicit loading is the default behavior in LINQ to Entities and the Entity Framework. When a query is executed within the Entity Framework, the entities returned by the query are fully accessible, but any associated entities are not immediately loaded. For example, if you write a query that retrieves all Orders in the EDM, a SQL query will be executed under the hood that grabs the order records and returns a collection of Orders entities. However, the customer record associated with an order is not fetched as a part of this query, and as a result of this, an Orders entity's associated Customers entity is not loaded. Therefore, the following code sample will throw an exception when it tries to access the Order's Customers since it is not loaded:

using (Entities entities = new Entities())
{
  var query = (from o in entities.Orders
               where o.Customers.CustomerID == "ALFKI"
               select o).First<Orders>();

  Orders order = query as Orders;
  Console.WriteLine(order.OrderID);

  Console.WriteLine(order.Customers.CompanyName);
}

The Entity Framework provides a Load method on each instance of the EntityReference class. This method can be used to explicitly load a collection that is associated with another entity. For example, you could change the previous code example by telling the Entity Framework that it needs to go get the Customers record for the Order. The revised code that performs the explicit load is shown in Figure 3. It first checks to see if the Customers entity is loaded. If not, it loads the Customers for that Order. This technique is known as explicit loading.

Figure 3 Explicit Loading

using (Entities entities = new Entities())
{
  var query = (from o in entities.Orders
               where o.Customers.CustomerID == "ALFKI"
               select o);

  foreach (Orders order in query)
  {
    if (!order.CustomersReference.IsLoaded)
      order.CustomersReference.Load();
    Console.WriteLine(order.OrderID + " --- " + 
      order.Customers.CompanyName);
  }
}

The example in Figure 3 will execute a SQL query to get the Customer record for each order that is processed. If iterating over hundreds or even just dozens of orders, this would cause a lot of individual queries against the database. If you know in advance that you will need more data than what will be returned by your query (such as customer information for an order), you can load this information ahead of time.

For some further thoughts on this topic, direct from the Microsoft team responsible for the Entity Framework, see the sidebar, "Insights: Entity Framework and Date Loading."

Figure 4 demonstrates a technique known as eager loading. The Include method being invoked on the Orders entity in the LINQ query accepts an argument that in this case directs the query to retrieve not only the Orders but their related Customers as well. This technique will generate a single SQL statement that loads all Orders and Customers meeting the criteria in the LINQ query.

Figure 4 Eager Loading

using (Entities entities = new Entities())
{
  var query = (from o in entities.Orders.Include("Customers")
               where o.ShipCountry == "USA"
               select o);

  foreach (Orders order in query)
    Console.WriteLine(order.OrderID + " --- " + 
      order.Customers.CompanyName);
}

You must understand that the type of loading is important for you to consider. If you use the Load method (as shown in Figure 3) to explicitly load entities during an iteration of a collection, this will cause several queries to hit the database, one for each call to the Load method.

This is a perfectly good technique if all you need to do is access the data once or twice. However, eager loading using the Include method (as shown in Figure 4) is a much better choice if you know that for a given entity, you will always need to have access to the data in related entities.

The best course of action is to profile the performance of the different options at your disposal and test them to see which best suits your scenarios. As with most decisions, the decision to use eager loading or explicit loading depends on your situation.

How do I see the SQL that will execute? I have often wanted to see what SQL would execute when I created a query with ObjectQuery and I've found two helpful techniques. The first is to use the SQL Server® Profiler tool (or whatever database engine's profiling tool is available to you). The second technique is to use the ObjectQuery class's ToTraceString method.

Figure 5 demonstrates how to invoke the ToTraceString method on an ObjectQuery<T>. Notice that the ObjectContext's connection is open (the EntityConnection). The ToTraceString method requires that the connection be open so it can determine the query that will execute. The ToTraceString method is available on both the ObjectQuery<T> and the EntityCommand classes.

Figure 5 Using ToTraceString

string city = "London";

using (Entities entities = new Entities())
{
  ObjectQuery<Customers> query = entities.CreateQuery<Customers>(
    "SELECT VALUE c FROM Customers AS c WHERE c.City = @city",
    new ObjectParameter("city", city)
  );
  entities.Connection.Open();

  Console.WriteLine(query.ToTraceString());

  foreach (Customers c in query)
    Console.WriteLine(c.CompanyName);
}

What can I do with complex types? The Entity Framework offers a data structure known as a complex type for representing a set of properties, usually closely related to each other. Consider an Address type. A complex type could be used to create an Address type for a customer so that the Customer entity's address-related properties (such as City, Region, and Phone) would now fall under the Address type rather than directly under the Customer type. Complex types provide for a logical grouping of similar scalar properties, which makes it easier to find closely related properties on an entity and keeps them logically grouped in the EDM. Like entities, complex types contain scalar properties; unlike them, they do not have identity (key values), and they cannot be persisted to the database via the ObjectContext. Complex types are an aspect of entities, not entities themselves. They are a good tool when grouping logically related properties of an entity.

How do I create a complex type? Since the EDM designer does not support visually creating complex types right now, creating them involves cracking open the EDMX file in an XML editor. The first step is to create the ComplexType in the Conceptual Scheme Definition Language (CSDL); then, modify the EntityType that will reference the complex type. Figure 6 shows the Customers EntityType in the CSDL with the newly created AddressType complex type. Notice that the Customers EntityType has had its Address, City, Region, and other address-related properties removed and replaced with a new property called Address. This new Address property refers to the AddressType complex type.

Figure 6 Creating a ComplexType

<EntityType Name="Customers">
  <Key>
    <PropertyRef Name="CustomerID" />
  </Key>
  <Property Name="CustomerID" Type="String" Nullable="false" 
    MaxLength="5" FixedLength="true" />
  <Property Name="CompanyName" Type="String" Nullable="false" 
    MaxLength="40" />
  <Property Name="ContactName" Type="String" MaxLength="30" />
  <Property Name="ContactTitle" Type="String" MaxLength="30" />
  <Property Name="Address" Type="Self.AddressType"  
    Nullable="false"/>
  <NavigationProperty Name="Orders" 
     Relationship="NorthwindEFModel.FK_Orders_Customers" 
      FromRole="Customers" ToRole="Orders" />
</EntityType>
<ComplexType Name="AddressType">
  <Property Name="Address" Type="String" MaxLength="60" />
  <Property Name="City" Type="String" MaxLength="15" />
  <Property Name="Region" Type="String" MaxLength="15" />
  <Property Name="PostalCode" Type="String" MaxLength="10" />
  <Property Name="Country" Type="String" MaxLength="15" />
  <Property Name="Phone" Type="String" MaxLength="24" />
  <Property Name="Fax" Type="String" MaxLength="24" />
</ComplexType>

The next step is to modify the mapping in the mapping specification language (MSL) to take into consideration the new complex type. Figure 7 shows the mapping of the Customers EntityType with the new AddressType complex type. The project will build and you can now reference the complex types in code. For example, the following LINQ query can be executed to filter only the customers in the city of London:

Figure 7 Mapping a Complex Type

<EntitySetMapping Name="Customers">
  <EntityTypeMapping 
    TypeName="IsTypeOf(NorthwindEFModel.Customers)">
    <MappingFragment StoreEntitySet="Customers">
      <ScalarProperty Name="CustomerID" ColumnName="CustomerID" />
      <ScalarProperty Name="CompanyName" ColumnName="CompanyName" />
      <ScalarProperty Name="ContactName" ColumnName="ContactName" />
      <ScalarProperty Name="ContactTitle" ColumnName="ContactTitle" />
      <ComplexProperty Name="Address" 
        TypeName="NorthwindEFModel.AddressType">
        <ScalarProperty Name="Address" ColumnName="Address" />
        <ScalarProperty Name="City" ColumnName="City" />
        <ScalarProperty Name="Region" ColumnName="Region" />
        <ScalarProperty Name="PostalCode" ColumnName="PostalCode" />
        <ScalarProperty Name="Country" ColumnName="Country" />
        <ScalarProperty Name="Phone" ColumnName="Phone" />
        <ScalarProperty Name="Fax" ColumnName="Fax" />
      </ComplexProperty>
    </MappingFragment>
  </EntityTypeMapping>
</EntitySetMapping>

var query = from c in entities.Customers
            where c.Address.City == "London"
            select c;

Wrapping Up

In this month's column I compared and contrasted the use of EntityClient and Object Services along with Entity SQL and LINQ. I also touched on how and why you create complex types, and I demonstrated how eager and explicit loading works. I've received so many questions from interested developers regarding the Entity Framework that I intend to discuss similar topics and practical tips in future installments of Data Points.

Send your questions and comments for John to mmdata@microsoft.com.

John Papa is a Senior .NET Consultant with ASPSOFT (aspsoft.com) and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. John, a C# MVP and INETA speaker, has authored several books about ADO, XML, and SQL Server. He can often be found speaking at industry conferences such as DevConnections and VSLive or blogging at johnpapa.net.