Building a LINQ Provider

By Pedram Rezaei, Microsoft Corporation

You can download the code for this article from here.

Throughout this article, we will build a simple IQueryable provider similar to the LINQ to SQL provider demonstrating what is needed to build a custom provider that is capable of executing LINQ queries and performing create, update and delete operations.

Language-Integrated Query (LINQ) provides a unified querying model for accessing data regardless of where that data is stored. This groundbreaking innovation bridges the gap between the world of objects and the world of data. If you are new to LINQ, Don Box and Anders Hejlsberg have a great overview here.

There are a few articles either on MSDN or on other websites describing the steps required to build a custom LINQ provider – most notably this one. In here, you will get an end-to-end and in-depth understanding of most of the elements involved in building a mini provider that is capable of performing simple queries and managing write operations (i.e. create, update and delete) as well as implementing a simple object tracker that helps with those operations.

One of the more popular LINQ providers is the LINQ to SQL provider that translates LINQ queries to T-SQL queries for Microsoft SQL Server. This provider also manages updates, deletes and inserts. In here, I will create a mini LINQ to SQL provider that with a few tweaks should also work with other relational databases such as Oracle and MySql.

The techniques used by LINQ for querying data differ based on whether the data containers are in-memory objects or the data source is a separate system such as Microsoft SQL Server or Active Directory.

In simple terms, say one is performing a query on a collection of in-memory Customer objects to find all those that live in London. This is usually achieved by enumerating all customer objects and selecting the ones that satisfy this predicate. This technique however is not efficient when dealing with data that is stored in a durable data store such as SQL Server. In such a case, we need a way of converting the original query to T-SQL (the domain-specific language for SQL Server) and benefit from SQL Server’s powerful querying engine. After all, it does not make sense to load all customers as objects into the managed heap and then loop through them to find the ones that live in London – what if we had a few million customers in the database?

Based on the above challenges, LINQ takes two different approaches to executing queries so that it is possible to target almost any source of data:

LINQ to Objects

If the source of data is a collection of in-memory objects, as long as the source collection implements the IEnumerable<T> interface, LINQ uses a set of extension methods (known as query operators) defined in the static System.Linq.Enumerable class to perform the query. This technique is known as “LINQ to Objects”.

With LINQ to Objects, the execution of queries is performed in managed code (IL) which makes this technique inefficient when querying data that is not stored as objects in the managed memory. For instance when querying data stored in a relational database, it might not be reasonable to load all the entities into the managed memory just to perform some simple operations that could have been better performed by the database engine. Obviously a better approach is to take the Query Expression and translate it to a SQL statement with a proper WHERE clause and let the relational database engine to perform the query. That is where the IQueryable interface plays a big part:

LINQ with IQueryable

If the source of a query expression is an IQueryable<T> instead of the raw IEnumerable<T> then the query is represented by a set of objects that preserves the structure of the actual query. This tree of objects is called an Expression Tree.

An expression tree can be given to a LINQ provider where it is usually translated to a domain-specific query language – or anything else for that matter. For example in the case of SQL Server, it can be translated to T-SQL, whereas in the case of Active Directory the same expression tree can be converted to an LDAP filter. With this approach, it is possible to develop all sorts of providers such as the ones shown here.

This article demonstrates how to create a mini IQueryable LINQ provider.

Before moving on, it is imperative to have a deep understanding of expression trees. If you are puzzled by what they are, let me show you a simple example. Imagine the following simple expression (where ‘i’ is an integer):

i <= 10

The expression tree representing this expression is a tree of objects that is shown below:


1 : Expression Tree for “ i <= 10”

At its root, this expression is captured by an instance of the BinaryExpression class. This object has a Left and a Right property of type Expression. The Left expression is a ParameterExpression called ‘i’ of type ‘Int32’ and the Right expression is a ConstantExpression representing the constant integer ‘10’. Now given this tree of objects, different providers can translate them into different domain-specific languages.

If the source of the LINQ Query is of type IQueryable<T> then the C# compiler automatically – at compile time – converts the query to an expression tree. Reflecting on the generated IL will show the expression tree representing the original expression. For example the query expression below is automatically converted to its expression tree representation at compile time:

var q = from c in dc.Customers
        where   c.City == "London"
        select c;


ParameterExpression pc =     Expression.Parameter(typeof(Customer), "c");
IQueryable<Customer> q3 =
      Expression.Lambda<Func<Customer, bool>>(
        Expression.Property(pc,   typeof(Customer).GetProperty("City")),
          Expression.Constant("London", typeof(string))
      ), new   ParameterExpression[] { pc }));

Nothing stops you of building an expression tree manually, however it is simpler to let the compiler to manage this for you. The following shows how to manually build an expression tree that represents the ‘ i <= 10 ’ expression:

BinaryExpression e = Expression.LessThanOrEqual(
    Expression.Parameter(typeof(int), "i"),
    Expression.Constant(10, typeof(int)));

Before moving on, there is a simple Visual Studio debug-time Visualizer (shown in Figure 1 ) that helps to visualize expression trees. I found this to be extremely useful when developing this custom LINQ provider.  Please see here for how to install and enable this for Visual Studio 2008.

As Figure 2 shows, our custom provider will translate LINQ queries (expression trees) to T-SQL and the result-sets returned from the database are mapped to objects using an object-relational mapper (ORM) that is discussed next. Write operations on the other hand, are tracked in memory and only executed when the application explicitly requests to submit the changes.


2 : Custom provider in action

As this provider is also an Object-Relational Mapper (ORM), it must provide the set of functionality described in this section.

The mapping information between objects and database entities (tables, views, rows and associations) must be captured. The technique used here is the same as the one used by LINQ to SQL in which TableAttribute and ColumnAttribute custom attributes are used to decorate classes and their members that correspond to database tables and columns respectively. Here is an example of such a mapper:

[Table(Name = "dbo.Products")] public partial class Product {  // ...  [Column(Storage = "_ProductID",           DbType = "Int NOT NULL IDENTITY",           IsPrimaryKey = true,           IsDbGenerated = true)]  public string ProductID { get; set; }

This class is mapped to a database table called ‘Products’ in the ‘dbo’ schema. The ‘ProductID’ property is associated to the ‘ProductID’ column of the ‘Products’ table. Throughout the rest of this article I use the terms “Entity Classes” to refer to mapping classes and “Entities” to refer to their instances.

LINQ to SQL uses SqlMetal.exe that comes as part of the Windows SDK or the “LINQ to SQL Classes” item in Visual Studio to automatically generate the entity classes. As our provider uses a very similar mapping technique to LINQ to SQL, we can still use these tools to auto-generate the entity classes.

An ORM also requires a mechanism for converting database result-sets to collections of objects. A basic implementation of such a component is also discussed later in this article.

Other components necessary for this custom provider are:

  • An implementation of IQueryable<T> that can be used as the source of LINQ queries
  • An implementation of IQueryProvider to execute expression trees. This class should eventually achieve the following:
    1. Convert expression trees to T-SQL (domain-specific language)
    2. Execute T-SQL
    3. Map the results of the query to a single object or a sequence of objects

One last element to add is an object tracker. This is required to track entities instantiated by the mapper. Without it, how would the provider know that an entity has been changed by the application so that it can generate a SQL UPDATE statement?

The first step of building a custom provider is to define a source, and the source has to implement IQueryable. The generic IQueryable<T> is identical to the non-generic IQueryable with the only difference that it also implements IEnumerable<T>. The non-generic version of IQueryable<T> interface is defined as:

public interface IQueryable : IEnumerable
  Type ElementType { get; }
  Expression Expression { get; }
  IQueryProvider Provider { get; }

The Expression property gives the expression tree that represents the current user query. System.Linq.Queryable is the equivalent class to System.Linq.Enumerable used for LINQ to Objects and provides a set of LINQ query operators such as Where and Select for IQueryable LINQ providers. Each query operator defined in this class represents the requested operation as a node of within the expression tree. For example the Where query operator looks similar to the following:

public static IQueryable<T> Where<T>(
  this IQueryable<T> source,
  Expression<Func<T, bool>> predicate)
  // Use reflection and find the current method that we are in
  MethodInfo currentMethod = (MethodInfo)MethodBase.GetCurrentMethod();
  // Build the expression tree node representing a method call
  // to this Where method
  MethodCallExpression e =
       new Expression[] {
         Expression.Quote(predicate) });

      // ...
When this Where query operator is used as a filter to an IQueryable<T>, it simply builds a new IQueryable<T> adding the above MethodCallExpression as its root node. This is how expression trees are built. For example the following simple query builds a pipeline of IQueryables each adding new nodes to the expression tree:

var q =
    .Where(c => c.City == "London")
    .Select(c => c.ContactName);

So far, we established that we need an IQueryable source for our provider. As database queries usually start from a database table, it is only logical to define a source representing a database table. 

The only query source provided by this mini provider is the DatabaseTable<TEntity> class that directly represents a database table. An instance of this class can be used as the source for LINQ queries:


Figure 3 : DatabaseTable<TEntity> and its interfaces

The DatabaseTable<TEntity> sits at the heart of our expression trees and is represented by a simple constant expression:

public sealed class DatabaseTable<TEntity> : IQueryable<TEntity>, ...
   public Expression Expression
     get { return Expression.Constant(this); }

DatabaseTable<TEntity> is an internal class meaning that an instance of this class cannot be created by others. The only way to use it is through another class called DatabaseDataContext which is the only entry point to this custom provider. When the DatabaseDataContext is instantiated, the user should pass in a connection string to the database. They can then call the GetTable<T>() method to access an instance of the DatabaseTable<T> which is the IQueryable<T> source for any subsequent LINQ queries:

SqlConnection cnn = new SqlConnection(Settings.Default.NorthwindCnnString); DatabaseDataContext dc = new DatabaseDataContext(cnn); IQueryable<Customer> customers = dc.GetTable<Customer>();

The main job of the DatabaseDataContext is to be the only facade that users would need in order to consume this LINQ provider.

What happens when GetTable<TEntity>() is called?

If this is the first time that an IQueryable of type TEntity is asked for, reflection is used on TEntity to create a tree of objects that represent the mapping metadata declared through the use of TableAttribute and ColumnAttribute custom attributes. This metadata is represented by the following three classes:


Figure 4 : Classes representing mapping metadata

A DatabaseMetaTable represents a mapper entity class such as the Product class that was shown earlier. It also includes a collection of type DatabaseMetaDataMember that are objects representing all fields and properties defined in the TEntity mapper class (e.g. the ProductID property in the Product class).

The PersistentDataMembers property of the DatabaseMetaTable is a subset of DataMembers representing a collection of properties or fields that are directly mapped to a database column (remember, not all data members necessarily have the ColumnAttribute meaning that it is possible to have data members that are not associated to a database column). The IdentityMembers property contains metadata for columns that are used as primary keys by the entity.

The DatabaseMetaModel holds an in-memory cache of DatabaseMetaTables to avoid the expensive cost of reflection every time that GetTable<TEntity>() is called (trading memory consumption for speed).

DatabaseMetaDataMember has some important properties such as IsDbGenerated that tells the provider whether the value of the column is automatically generated by the database engine and therefore should be ignored when performing write operations. These properties are usually mapped directly to properties of the ColumnAttribute.

Here is a description of some of the other interesting properties of the DatabaseMetaDataMember:




Is Null an acceptable value for this item? For instance, if the data member is mapped to a database column then is this a null-able column?


If a database column then the DbType represents the real SQL data type for this column.


The actual entity type (class) that contains this field or property


Is this member associated to a database column?


The actual name of the column in the database. For example the column name could be “ContactName” whereas the property representing this column might have been called “CustomerName”.


The MemberInfo object referencing the actual property or field of the entity class. This is used to gain access to the value of this member.


Similar to the Member property, with the difference that it represents the underlying member that should be used by the mapper to set the value of a cell read from the database – should the setter of a property be called when the value of the column is read from the database or should the mapper find the underlying field and set its value directly?

Based on the mapping metadata for TEntity, GetTable<TEntity>() creates and returns an instance of DatabaseTable<TEntity>:

public DatabaseTable<TEntity> GetTable<TEntity>() where TEntity : class
  DatabaseMetaTable metaTable = _metaModel.GetTable(typeof(TEntity));
  if (metaTable == null)
    throw new Exception(
      string.Format("{0} is not decorated with the TableAttribute.",
  IDatabaseTable table = GetTable(metaTable);
  if (table.ElementType != typeof(TEntity))
    throw new Exception(
      string.Format("It was not possible to find a table for type {0}",
  return (DatabaseTable<TEntity>)table;
private IDatabaseTable GetTable(DatabaseMetaTable metaTable)
  IDatabaseTable table;
  if (!_tables.TryGetValue(metaTable, out table))
    table = (IDatabaseTable)Activator.CreateInstance(
        new Type[] { metaTable.EntityType }),
        BindingFlags.NonPublic | BindingFlags.Public |        BindingFlags.Instance, null,
        new object[] { this, metaTable }, null);
    _tables.Add(metaTable, table);
  return table;

Phew! We now have an instance of IQueryable<T> that gives access to the expression tree representing the query. The next step is to define an IQueryProvider to execute the query.

The IQueryable interface has a property called Provider that gives access to an object implementing the IQueryProvider interface. The IQueryProvider has 2 methods (with a couple of overloads) one that takes an expression tree and creates an IQueryable for it and the other which executes an expression tree:

public interface IQueryProvider
  IQueryable<TElement> CreateQuery<TElement>(Expression expression);
  TResult Execute<TResult>(Expression expression);

For cleanness of design, our IQueryable<T> implementer (DatabaseTable<T>) is also an IQueryProvider, so the implementation of the properties required by IQueryable<T> should look similar to the following:

public sealed class DatabaseTable<TEntity> : IQueryable<TEntity>, ...
   ...   public Type ElementType
     get { return typeof(TEntity); }

   public Expression Expression
     get { return Expression.Constant(this); }

   public IQueryProvider Provider
     get { return this; }

Query Execution

There are 2 ways that a LINQ query can start executing. You can explicitly call the Execute method of the provider, or rely on the call to IEnumerable.GetEnumerator() when enumerating over the IQueryable – remember that an IQueryable is also an IEnumerable.

Not all queries result in a sequence of entities. For those query operators such as Sum, Max, Min, Count and First, a call is explicitly made to the Execute method of the provider. However for other queries, the enumerator implicitly executes the query.

Because both Execute() and GetEnumerator() attempt to execute the query, it is only logical to factor the execution logic into one central place. For this purpose, we will create a class called DatabaseProvider which DatabaseTable<T> will delegate the calls to:

public sealed class DatabaseTable<TEntity> : IQueryProvider ...
  public IEnumerator<T> GetEnumerator()
    return ((IEnumerable<T>)provider.
  object IQueryProvider.Execute(Expression expression)
    return provider.Execute(expression);

As you can see, most of the work is delegated to the DatabaseProvider object accessible through the provider member. The provider model now looks like Figure 5 .


5 : Providers

The task of the Execute method is well defined. It receives an expression tree representing the user query and returns a sequence of entities or a single entity whichever the query has asked for:

public object Execute(Expression query)
  // translate
  QueryTranslator translator = new QueryTranslator(_dataContext.Model);
  QueryInfo info = translator.Translate(query);
  // execute
  return Execute(info);

As you can see, it first has to translate the expression tree to an executable T-SQL statement that is returned as plain text (accessible through info.QueryText ). The call to the second Execute method will finally execute the plain T-SQL statement. Here is a simplified version of this Execute method:

private object Execute(QueryInfo info)
  // get a database connection
  DbConnection connection = _cnnManager.GetOpenConnection();
    // execute the query
    DbCommand command = connection.CreateCommand();
    command.CommandText = info.QueryText;

    DbDataReader reader =         command.ExecuteReader(CommandBehavior.SingleResult);

    // map the database rows into objects (entities)
    // ...

    // return the mapped entities
    // ...

Query Translation

Before looking at how the mapping is done, let’s focus on the query translation process. This is where expression trees are converted to T-SQL queries. There are a number of classes that are involved in this process:


6 : Query Translation Classes

The QueryTranslator is responsible for walking the expression tree and find any method calls to Where, Select, OrderBy, ThenBy, OrderByDescending, ThenByDescending, Take, First and FirstOrDefault extension methods (LINQ query operators) defined in the System.Linq.Queryable class. You may have noticed that this mini LINQ provider does not support other query operators such as Skip and Distinct. However you should be able to add many of those relatively simply.

At this point, it is worth mentioning some of the limitations of this mini custom LINQ Provider:

  • It does not support sub-queries or queries that require references to more than one database table
  • It does not support queries that may return more than one result-set
  • For a full list of query operators offered by System.Linq.Queryable see here. Any operator that is not mentioned above is not supported.
  • No more than one Where, Select, Take, First or FirstOrDefault operator can exist in an expression tree.
  • Take, First or FirstOrDefault operators can only appear at the bottom of an expression (the root of the expression tree)
  • Not all SQL Server and CLR data types are supported. For instance conversion from some SQL data types to CLR types may fail.
  • Partial evaluation of expression trees is not carried out meaning that, for example, access to local variables can cause exceptions. See here for a generic implementation of a partial evaluator.

Earlier in the Expression Trees section, it became apparent that expression trees can represent any type of expressions. In fact, they can even include method calls (MethodCallExpression) and lambda expressions (LambdaExpression). Although this level of flexibility makes expression trees an incredibly powerful tool to describe expressions, it adds to the complexity of LINQ providers as they need to analyse the expression tree and usually convert it to a domain-specific language.

In order to reduce this complexity, many LINQ providers only support a subset of the query operators and expression tree combinations. For example the first version of the LINQ provider for ADO.NET Data Services released with .NET Framework 3.5 SP1 does not support the Sum query operator:

dm.Books.Sum(b => b.Price);

To make this process even easier, it is possible to codify a common pattern for visiting all nodes in an expression tree. Visiting a tree is a well-known topic and there are many algorithms that make it trivial to visit all the nodes and receive a notification when a node of a given type is found.

A very interesting approach used by the .NET Framework 3.5 implementation of LINQ to SQL is to define a generic expression visitor that can be inherited by all interested parties (the abstract ExpressionVisitor class shown in Figure 6 ). It consists of many virtual methods such as VisitBinary and VisitConstant. If you are looking for a ConstantExpression then the VisitConstant is called when the expression tree is being visited by the ExpressionVisitor, so all you need to do is to override the VisitConstant method of the base class (ExpressionVisitor).

Here is the code listing of one of main entry method of this class: the Visit method that starts the whole process:

protected virtual Expression Visit(Expression exp)
  if (exp == null)
    return exp;
  switch (exp.NodeType)
    case ExpressionType.Add:
    case ExpressionType.AddChecked:
    case ExpressionType.And:
    case ExpressionType.AndAlso:
    case ExpressionType.ArrayIndex:
    case ExpressionType.Coalesce:
    case ExpressionType.Divide:
    case ExpressionType.Equal:
    case ExpressionType.ExclusiveOr:
    case ExpressionType.GreaterThan:
    case ExpressionType.GreaterThanOrEqual:
    case ExpressionType.LeftShift:
    case ExpressionType.LessThan:
    case ExpressionType.LessThanOrEqual:
    case ExpressionType.Modulo:
    case ExpressionType.Multiply:
    case ExpressionType.MultiplyChecked:
    case ExpressionType.NotEqual:
    case ExpressionType.Or:
    case ExpressionType.OrElse:
    case ExpressionType.Power:
    case ExpressionType.RightShift:
    case ExpressionType.Subtract:
    case ExpressionType.SubtractChecked:
      return this.VisitBinary((BinaryExpression)exp);
    case ExpressionType.ArrayLength:
    case ExpressionType.Convert:
    case ExpressionType.ConvertChecked:
    case ExpressionType.Negate:
    case ExpressionType.NegateChecked:
    case ExpressionType.Not:
    case ExpressionType.Quote:
    case ExpressionType.TypeAs:
      return this.VisitUnary((UnaryExpression)exp);
    case ExpressionType.Call:
      return this.VisitMethodCall((MethodCallExpression)exp);
    case ExpressionType.Conditional:
      return this.VisitConditional((ConditionalExpression)exp);
    case ExpressionType.Constant:
      return this.VisitConstant((ConstantExpression)exp);
    case ExpressionType.Invoke:
      return this.VisitInvocation((InvocationExpression)exp);
    case ExpressionType.Lambda:
      return this.VisitLambda((LambdaExpression)exp);
    case ExpressionType.ListInit:
      return this.VisitListInit((ListInitExpression)exp);
    case ExpressionType.MemberAccess:
      return this.VisitMemberAccess((MemberExpression)exp);
    case ExpressionType.MemberInit:
      return this.VisitMemberInit((MemberInitExpression)exp);
    case ExpressionType.New:
      return this.VisitNew((NewExpression)exp);
    case ExpressionType.NewArrayInit:
    case ExpressionType.NewArrayBounds:
      return this.VisitNewArray((NewArrayExpression)exp);
    case ExpressionType.Parameter:
      return this.VisitParameter((ParameterExpression)exp);
    case ExpressionType.TypeIs:
      return this.VisitTypeIs((TypeBinaryExpression)exp);
      throw new Exception(
        string.Format("Unhandled expression type: '{0}'", exp.NodeType));

All of the translators shown in Figure 6 , including both the QueryTranslator and the WhereTranslator classes, inherit from the abstract ExpressionVisitor. The QueryTranslator class shown in the same figure is only interested in finding calls to Where, Select and other similar query operators. So the only overridden method from its base class (ExpressionVisitor) is the VisitMethodCall method:

protected override Expression VisitMethodCall(MethodCallExpression mc)
  Type declaringType = mc.Method.DeclaringType;
  if (declaringType != typeof(Queryable))
    throw new NotSupportedException(
      "The type for the query operator is not Queryable!");
  switch (mc.Method.Name)
    case "Where":
      // is this really a proper Where?
      var whereLambda = GetLambdaWithParamCheck(mc);
      if (whereLambda == null)
      VisitWhere(mc.Arguments[0], whereLambda);
    case "OrderBy":
    case "ThenBy":
      // is this really a proper Order By?
      var orderLambda = GetLambdaWithParamCheck(mc);
      if (orderLambda == null)
      VisitOrderBy(mc.Arguments[0],          orderLambda, OrderDirection.Ascending);
    case "OrderByDescending":
    case "ThenByDescending":
      // is this really a proper Order By Descending?
      var orderDescLambda = GetLambdaWithParamCheck(mc);
      if (orderDescLambda == null)
      VisitOrderBy(mc.Arguments[0],          orderDescLambda, OrderDirection.Descending);
    case "Select":
      // is this really a proper Select?
      var selectLambda = GetLambdaWithParamCheck(mc);
      if (selectLambda == null)
      VisitSelect(mc.Arguments[0], selectLambda);
    case "Take":
      if (mc.Arguments.Count != 2)
      VisitTake(mc.Arguments[0], mc.Arguments[1]);
    case "First":
      // This custom provider does not support the use of a First operator
      // that takes a predicate. Therefore we check to ensure that no more
      // than one argument is provided.
      if (mc.Arguments.Count != 1)
      VisitFirst(mc.Arguments[0], false);
    case "FirstOrDefault":
      // This custom provider does not support the use of a FirstOrDefault
      // operator that takes a predicate. Therefore we check to ensure that
      // no more than one argument is provided.
      if (mc.Arguments.Count != 1)
      VisitFirst(mc.Arguments[0], true);
      return base.VisitMethodCall(mc);
  return mc;

This article will not expand on the implementation of the supported query operators except for the more interesting WhereTranslator. When a call to Queryable Where operator is discovered, an instance of the WhereTranslator class is created and its Translate method is called passing in the lambda expression representing the where condition (predicate):

private void VisitWhere(Expression queryable, LambdaExpression predicate)
  // this custom provider cannot support more
  // than one Where query operator in a LINQ query
  if (_whereTranslator != null)
    throw new NotSupportedException(
       "You cannot have more than one Where operator in this expression");
  _whereTranslator = new WhereTranslator(_model);

The WhereTranslator also inherits from the ExpressionVisitor and is only interested in Binary, Constant, MemberAccess and Unary expressions. When its Translate method is called, it creates a StringBuilder that is then used to build the WHERE clause of the T-SQL query. Here is an example of how binary operations can be converted to T-SQL:

StringBuilder _sb; protected override Expression VisitBinary(BinaryExpression b)
  switch (b.NodeType)
    case ExpressionType.And:
    case ExpressionType.AndAlso:
      _sb.Append(" AND ");
    case ExpressionType.Or:
    case ExpressionType.OrElse:
      _sb.Append(" OR ");
    case ExpressionType.Equal:
      if (IsComparingWithNull(b))
        _sb.Append(" IS ");
        _sb.Append(" = ");
    case ExpressionType.GreaterThan:
      _sb.Append(" > ");
  return b;

In this method, the Left expression is first visited to identify any constant or member access expressions that could fill the left part of a binary operator. The binary operator comes next followed by a visit to the Right expression. Note that in SQL Server, due to its three-valued logic, “IS” or “IS NOT” operators should be used when testing for null values (“ City = NULL ” has a different meaning to “ City IS NULL ”, namely will always be false).

Once translated, you can access the WhereClause property of the WhereTranslator to get the string that represents the actual WHERE clause:

internal string WhereClause
  get { return _sb.ToString(); }

The same technique is also used to construct ORDER BY, projection and other parts of a SELECT statement. Once all clauses are built, they need to be assembled into a SELECT statement representing the original query (the following method is defined in QueryTranslator):

private QueryInfo ConvertToExecutableQuery(Expression query)
  // Find the query source
  DatabaseMetaTable source;
  if (!GetSourceTable(query, out source))
    throw new NotSupportedException(       "This query expression is not supported!");
  StringBuilder sb = new StringBuilder();
  bool useDefault = false;
  sb.Append("SELECT ");
  // TOP
  if (_takeTranslator != null && _takeTranslator.Count.HasValue)
    useDefault = _takeTranslator.UseDefault;
    sb.Append("TOP ");
    sb.Append(" ");
  if (_selectTranslator == null || !_selectTranslator.DataMembers.Any())
    // project on all the mapped columns
    _selectTranslator = new ProjectionTranslator(         _model, source.PersistentDataMembers);
  if (!_selectTranslator.DataMembers.Any())
    throw new Exception(        "There are no items for projection in this query!");
  // FROM
  sb.Append(" FROM ");
  // WHERE
  if (_whereTranslator != null)
    string where = _whereTranslator.WhereClause;
    if (!string.IsNullOrEmpty(where))
      sb.Append("WHERE ");
  if (_orderByTranslator != null)
    string orderby = _orderByTranslator.OrderByClause;
    if (!string.IsNullOrEmpty(orderby))
      sb.Append("ORDER BY ");
  return new QueryInfo
      QueryText = sb.ToString(), // The actual SQL query
      SourceMetadata = source,
      LambdaExpression = _selectTranslator.ProjectionLambda,
      ResultShape = GetResultShape(query),
      UseDefault = useDefault

The query is now ready to be executed. See the Query Execution section for more information on how this query is executed.

Once executed, the result-set returned from the database needs to be converted or mapped to a single instance of an entity or a sequence of entities. This is the job of the DatabaseResultMapper<T> class.

DatabaseResultMapper<T> is an IEnumerable<T> that takes a DbDataReader instance and maps database rows to objects. This conversion can be performed one row at a time when MoveNext is called on the enumerator created by the source of the query. It could perhaps use the deferred execution model that is commonly used by LINQ to Objects, for mapping rows to entities (objects).


Figure 7 : The result-set mapper class

Before demoing the code of the mapper, it is important to explain how projection is implemented in this custom provider. Imagine the following simple query:

IEnumerable<string> phoneNumbers =
   from c in dc.GetTable<Customer>()
   select c.Phone;

This query is actually equivalent to the following:

IEnumerable<string> phoneNumbers =    dc.GetTable<Customer>()    .Select(c => c.Phone);

As you can see, the call to the Select operator is converted to a lambda expression that takes an instance of a Customer and returns a string which happens to be the phone number of the customer. This is called projection because we are returning only a few of the cells for a given customer row (projecting on customer). Here are some more examples:


Selecting the name of all customers

IEnumerable <string> names =
    .Select(c => c.CustomerName);


Projecting all customers to instances of the already existing Contact type

IEnumerable <Contact> contacts =


    c => newContact


        Name = c.CustomerName,

        Phone = c.Phone



Projecting all customers to instances of a newly created C# anonymous type (a type that is created by the compiler at the compile type)

var customerInfos =

    c => new


        ID = c.CustomerID,

        Name = c.CustomerName



Projecting and changing the return value at the same time by performing a string concatenation

IEnumerable <string> phoneNumbers =


    .Select(c => "+44 " + c.Phone);

One thing that all of the lambda expressions that are used by the above Select operators have in common is that they only take a single parameter of type Customer. Therefore in order to perform the operations defined by these lambda expressions, we need to execute the lambdas passing in an instance of Customer and remembering the result of the call. Here is a list of all the 4 Select lambda expressions shown above:

  • c => c.CustomerName
  • c => newContact {Name = c.CustomerName,Phone = c.Phone }
  • c => new { ID = c.CustomerID,  Name = c.CustomerName }
  • c => "+44 " + c.Phone

In order to be able to perform the specifics defined in these lambda expressions, we first need to compile their expression tree representations into executables (IL). Fortunately, the expression tree API provides an efficient mechanism for compiling a LambdaExpression into IL using Reflection.Emit. You can simply call the Compile method on the expression tree:

Delegate f = lambdaExpression.Compile();

Now that the lambda expression is compiled, we can invoke it dynamically passing in an instance of a Customer:


The lambda expressions above take an instance of Customer. Therefore our result mapper should always create an instance of Customer and pass that to the compiled code to perform the projection.

The final step is to read each result from the returned result-set and convert it to an instance of Customer (entity) and if needed perform a projection. The code below illustrates how this is done by the DatabaseResultMapper<TEntity> class. Please note the use of reflection APIs to dynamically create instances of TEntity:

public IEnumerator<TEntity> GetEnumerator()
  // compile the lambda expression into a function
  Delegate projectionFunction = null;
  if (_info.LambdaExpression != null)
    projectionFunction = _info.LambdaExpression.Compile();
  bool isFirst = true;
  MemberInfo[] members = null;
  BindingFlags bindingFlags = BindingFlags.NonPublic |
                              BindingFlags.Public |
  while (_reader.Read())
    if (isFirst)
      // find the order of the columns returned by the database
      members = new MemberInfo[_reader.FieldCount];
      var persistentDataMembers =           _info.SourceMetadata.PersistentDataMembers;
      for (int i = 0; i < _reader.FieldCount; i++)
        string colName = _reader.GetName(i);
        DatabaseMetaDataMember mem =
            p => string.Compare(p.MappedName, colName,                    true, CultureInfo.InvariantCulture) == 0);
        if (mem == null)
          throw new Exception(string.Format(
            "It was not possible to find a mapping column for {0}",
        members[i] = mem.StorageMember ?? mem.Member;
      isFirst = false;
    // create a single instanace of the Entity (the mapper object)
    object entity = Activator.CreateInstance(
      _info.SourceMetadata.EntityType, bindingFlags, null, null, null);
    // populate its members with values from the result-set
    for (int i = 0; i < members.Length; i++)
      // Do magic conversion from SQL type to CLR type!
      // NOTE: I am using a very simplied conversion technique here.       // You may want to use a more complex one...
      Type memberType = TypeHelper.GetMemberType(members[i]);
      // is this a Nullable type? if yes, then get       // its generic type argument for conversion
      if (TypeHelper.IsNullableType(memberType))
        memberType = memberType.GetGenericArguments()[0];
      object value = Convert.ChangeType(_reader.GetValue(i), memberType);
      // set the value of the member on the entity instance to 'value'
      TypeHelper.SetMemberValue(entity, members[i], value);
    // project on the entity by calling the     // compiled projection funcation
    if (projectionFunction == null)
      // this entity needs to be tracked
      // some tracking code to go here ...
      yield return (TEntity)trackedEntity;
      yield return (TEntity)projectionFunction.DynamicInvoke(entity);

The steps taken by this mapper are:

  1. If the query expression has projection, compile the projection’s lambda expression to IL
  2. Find the order of the columns in the result-set returned from the database
  3. For each row of the result-set:
    • Create an instance of the entity
    • Populate its members with values from the current row
    • If the query expression has projection, invoke the compiled projection delegate

Some queries may return a sequence of objects and other may only be able to return a single object. For instance, as shown below, the first expression returns a sequence of customers (IEnumerable<Customer>) whereas the second one returns a single Customer object:


IEnumerable<Customer> customers =
  from c in dc.GetTable<Customer>()
  where c.City == "London"
  select c;


Customer firstCustomer = dc.GetTable<Customer>().First();

Therefore it is necessary to find a way of distinguishing between the shapes of the results. For that reason, you will find an enum that declares the shape of the result of a query. This is determined when the query expression is examined by the QueryTranslator:

internal enum ResultShape
  None,       // The query is not expected to have a return value
  Singleton,  // It returns a single entity
  Sequence    // It returns a sequence of entities

Here is a simplistic implementation of a function that identifies the shape of the result of an expression tree:

private ResultShape GetResultShape(Expression query)
  LambdaExpression lambda = query as LambdaExpression;
  if (lambda != null)
    query = lambda.Body;
  if (query.Type == typeof(void))
    return ResultShape.None;
  if (query.Type == typeof(IMultipleResults))
    throw new NotSupportedException(       "Multiple result shape is not supported");
  MethodCallExpression methodExp = query as MethodCallExpression;
  if (methodExp != null &&       ((methodExp.Method.DeclaringType == typeof(Queryable))  ||
      (methodExp.Method.DeclaringType == typeof(Enumerable))))
    string str = methodExp.Method.Name;
    if (str != null && (str == "First" || str == "FirstOrDefault" ||                         str == "Single" || str == "SingleOrDefault"))
      return ResultShape.Singleton;
  return ResultShape.Sequence;

As you can see, if the expression tree includes a First, FirstOrDefault, Single or SingleOrDefault query operator as its root element, then it is assumed that only a single object should be returned. Based on this, the implementation of the second Execute method of the DatabaseProvider class must have different logic for each possible ResultShape:

private object Execute(QueryInfo info)
  // Get an open connection to the database
  DbConnection connection = _cnnManager.GetOpenConnection();
    // Build a SQL Command
    DbCommand command = connection.CreateCommand();
    command.CommandText = info.QueryText;
    // Attempt to excute the query if no result was expected from the query
    if (info.ResultShape == ResultShape.None)
      DbDataReader reader =        command.ExecuteReader(CommandBehavior.SingleResult);
      // What is the CLR type of the returned result-set rows?
      Type resultEntityType =
        (info.LambdaExpression == null) ?           info.SourceMetadata.EntityType :           info.LambdaExpression.Body.Type;
      // Build a pipeline so that we can       // read and map the results returned from the DB.
      // Do this by dynamically creating an instance       // of DatabaseResultMapper<> class.
      // The followig use of reflection is equivalent to:
      //    IEnumerable mappedResults =       //         new DatabaseResultMapper<T>(      //           info, reader, _dataContext.ChangeTracker);
      IEnumerable mappedResults =       (IEnumerable)Activator.CreateInstance(typeof(DatabaseResultMapper<>)
          .MakeGenericType(new Type[] { resultEntityType }),
          BindingFlags.NonPublic | BindingFlags.Public |          BindingFlags.Instance, null,           new object[] { info, reader, _dataContext.ChangeTracker }, null);
      // Are we expecting a single entity or a sequence of entities?
      if (info.ResultShape == ResultShape.Sequence)
        // Read the results by enumerating through all mappedResults
        // This is not perfect but as I am sharing the database
        // connection between all queries and even write operations,
        // I need to load all entities into memory by fully reading
        // them using the DbDataReader (mappedResults) – that is exactly         // what the constructor of List<> does here.
        // The followig use of reflection is equivalent to:
        //    return new List<T>(mappedResults);
        return Activator.CreateInstance(typeof(List<>).MakeGenericType(
          new Type[] { resultEntityType }),           new object[] { mappedResults });
      else if (info.ResultShape == ResultShape.Singleton)
        IEnumerator enumerator = mappedResults.GetEnumerator();
        if (enumerator.MoveNext())
          return enumerator.Current;
        // If First or Single query operators are used        // throw an excpetion because no rows where returned.
        // You should always expect at least one element
        // when First or Single are used
        else if (!info.UseDefault)
          throw new Exception(            "No entity was found meeting the specified criteria.");
    return null;

As you can see, the implementation of this method is extremely simple. It basically decides how many times the result mapper should be enumerated (none, once, or multiple times) based on the shape of the result that was discovered earlier.

This provider is also capable of executing dynamic SQL or stored procedures that return a sequence of entities. As long as a mapper class exists for the returned entity then the provider automatically uses the result mapper to map the returned result-set to a sequence of objects. Here is an example of executing a dynamic query:

SqlConnection cnn = new SqlConnection(Settings.Default.NorthwindCnnString);
DatabaseDataContext dc = new DatabaseDataContext(cnn);
var ps = dc.ExecuteQuery<Product>(
    @"SELECT p.ProductID, p.ProductName, p.UnitPrice FROM dbo.Products p
      JOIN dbo.Categories c on c.CategoryID = p.CategoryID
      WHERE c.CategoryName = 'Beverages'");
foreach (var pr in ps)
  Console.Write(pr.ProductName + ",");

The query part of the provider is now complete. Let’s see a few examples:



Generated SQL

Display a list of customer phone numbers

DatabaseDataContext dc =

var customers =
  dc.GetTable<Customer>().Select(c => c.Phone);

foreach (var c in customers)

  Console.Write(c + ",");

SELECT [Phone] FROM dbo.Customers

Display a list of customers living in London with no phone number

var q =

  from c in dc.GetTable<Customer>()

  where c.City == "London" && c.Phone != null

  orderby c.CustomerName, c.Country

  select c.CustomerName;

SELECT [ContactName] FROM dbo.Customers

WHERE (( [City] ='London')AND([Phone] ISNOTNULL))

ORDER BY [ContactName] ASC, [Country] ASC

Take the first 5 customers

var q = dc.GetTable<Customer>().Take(5);


  [CustomerID], [CompanyName], [ContactName],

  [ContactTitle], [Address], [City], [Region],

  [PostalCode], [Country], [Phone], [Fax]

FROM dbo.Customers

So far this custom provider has only been able to build and execute T-SQL queries. The next step is to introduce a mechanism to handle write operations.

Entity tracking (also known as object tracking) is a technique used to track all entity instances created or known to a provider. This is heavily used to organise and manage write operations. It is necessary to keep track of all entity instances known to this custom provider because:

  • Most changes to an entity object representing a row in the database should be tracked so that they can be applied to the database.
  • Write requests (update, insert and delete) are all kept in memory until the user explicitly asks the provider to submit the changes.
  • Entity objects representing the same row in the database should also be the same object in the managed heap. With this approach, any changes to one entity is reflected by all other references so that when changes are being submitted to the database only a single UPDATE operation incorporating all changes is executed.

One may argue that when a change occurs, it should be possible to generate the DML statement for the change at that point in time. Unfortunately that is not a workable solution because of the performance overhead it would impose on applications (every write would require execution of a DML statement).

The entity tracking technology used by our provider applies a batch of all write operations in a transaction when the SubmitChanges method of the provider’s entry point (DatabaseDataContext) is manually called.

There are many states that an entity can participate in. LINQ to SQL has a comprehensive list of states that its objects can be in. The only tracking states used by this simple custom provider are ToBeInserted, ToBeDeleted and ToBeUpdated.

As was mentioned earlier, every single object created by the result mapper (DatabaseResultMapper<T>) must be tracked by the change tracking system. This means keeping a reference to those objects. Over time this list of objects will grow and as our change tracker is keeping a strong reference to all of those objects, they are never garbage collected even if they are no longer referenced anywhere else.

In order to avoid this possible memory issue, the implementation of the tracker uses a collection of WeakReference objects. In one sentence, a WeakReference references an object while still allowing that object to be reclaimed by the garbage collector:

List <WeakReference> _trackedItems;

If an object has been reclaimed by the garbage collector, the WeakReference’s IsAlive property would be set to false. However if the original object is still in memory, you can use the Target property to access that original object. In the sample below, assuming that obj1 is still alive in memory, both obj1 and obj2 variables should be referencing the same object:

WeakReference wr = new WeakReference(obj1);
  object obj2 = wr.Target;

Change Tracking

The entity tracking and change processing is managed by two classes, the ChangeTracker and the ChangeProcessor classes:


Figure 8 : The main classes used to manage write requests

When an entity instance is created by the result mapper, or is newly inserted by the user, it is attached to the ChangeTracker by calling the Attached method.

In order to receive change notifications from entity instances, the entity classes must implement the INotifyPropertyChanging interface, and also any changes to their properties should raise the PropertyChanging event defined by this interface:

[Table(Name = "dbo.Products")]
public partial class Product : INotifyPropertyChanging {  public event PropertyChangingEventHandler PropertyChanging;
  protected virtual void SendPropertyChanging()
    if ((this.PropertyChanging != null))
      this.PropertyChanging(this, emptyChangingEventArgs);
  public string ProductName
    get { return this._ProductName;  }
    set     {
      if ((this._ProductName != value))
        this._ProductName = value;

The change tracking is performed by the ChangeTracker class that subscribes to the PropertyChanging event of all mapped entity instances. When the provider is notified of the first change to an object, it creates an instance of the TrackedObject and adds it to a collection of objects that require processing. This collection includes a list of newly created, updated and deleted objects.


Figure 9 : Tracked objects and their states

The code listing below demonstrates the logic used by the ChangeTracker’s event handler for all PropertyChanging events:

private Dictionary<object, TrackedObject> _items; ... private void OnPropertyChanging(
        object sender, PropertyChangingEventArgs args)
  // Is this object already being tracked?
  TrackedObject trackedObj;
  if (!_items.TryGetValue(sender, out trackedObj))
    DatabaseMetaTable metaTable = CheckAndGetMetaTable(sender.GetType());
    // This has not been tracked in the past so create a new tracked object
    trackedObj = ...;
    _items.Add(sender, trackedObj);

This custom provider is unable to keep track of changes to objects that do not implement INotifyPropertyChanging.

It is also worth mentioning that the ChangeTracker should often compress its in-memory collection of WeakReferences and remove the ones with garbage collected targets.

At this point, I would like to define the scope in which changes are tracked by the ChangeTracker. All changes and in fact all write operations such as delete and insert, are cached by the ChangeTracker until the AcceptChanges method of this class is called. That method in turn can only be called by the SubmitChanges method of the DatabaseDataContext.

When SubmitChanges is called, all changes, newly created entities and delete requests are converted to database commands and are executed inside a transaction. The transaction is only committed when all commands succeed.

The conversion of TrackedObjects to database commands is done by the ChangeProcessor class:

internal class ChangeProcessor
  internal void SubmitChanges()
    foreach (var obj in _tracker.GetInterestingObjects())
      DbCommand command = null;
      switch (obj.TrackingState)
        case TrackingState.ToBeInserted:
          command = BuildInsertCommand(obj);
        case TrackingState.ToBeUpdated:
          command = BuildUpdateCommand(obj);
        case TrackingState.ToBeDeleted:
          command = BuildDeleteCommand(obj);
      if(command != null)

And here is how the ChangeProcessor converts an insert request to a SQL INSERT command:

private DbCommand BuildInsertCommand(TrackedObject obj)
  if (obj == null)
    throw new ArgumentNullException("obj");
  DbCommand command = GetNewDbCommand();
  StringBuilder sb = new StringBuilder("INSERT INTO ");
  sb.Append(" (");
  // Find all the insertable columns (excluding
  // all the auto generated data members)
  var columns =
      .Where(c => !c.IsDbGenerated);
  StringBuilder columnValuesSb = new StringBuilder();
  bool isFirst = true;
  // Include all the columns in the INSERT statement
  foreach (var col in columns)
    if (!isFirst)
      sb.Append(", ");
      columnValuesSb.Append(", ");
      isFirst = false;
    // What if the name of the column has a space character in it?
    // Get the value from the entity for this column
    MemberInfo memberInfo = col.StorageMember ?? col.Member;
    // Format the value of the column to
    // its acceptable SQL representation
    object value = FormatHelper.FormatDbValue(
      TypeHelper.GetMemberValue(obj.Entity, memberInfo));
  sb.Append(") VALUES (");
  command.CommandText = sb.ToString();
  return command;

DatabaseDataContext.SubmitChanges() now looks similar to the following – it effectively delegates the actual translation and execution of the database commands to the ChangeProcessor and has the responsibility of creating a new transaction or reusing an existing one:

public void SubmitChanges()
  // Get an open database connection
  DbConnection connection =
    // Create or use a transaction
    using (TransactionScope ts = new TransactionScope())
      // Enlist in this transaction
      // Process all changes and apply them to the database
      ChangeProcessor processor =
        new ChangeProcessor(_changeTracker, this);
      // Vote to commit the transaction
      // If all changes were committed successfuly
      // to the DB then accept all the changes

ChangeTracker.AcceptChanges() is called only when all SQL commands are executed successfully. When called, deleted objects are no longer tracked and the ChangeTracker subscribes to the PropertyChanging event of all the newly created objects.

Both delete and update operations rely heavily on the primary keys of the entity. In fact, if you look at the database commands generated for these operations, you will notice that the generated WHERE clause only specifies the primary key columns. It is worth pointing out that no delete or update operations can be performed by this mini provider if the entity does not have a primary key defined (an exception is thrown).

Also, it is not possible to modify a primary key using this provider as it does not keep a copy of the original primary key values so that they could be used by the UPDATE statement to locate the original record.

Here are some examples of write operations against the Northwind database:



Generated SQL

Inserting a new customer

DatabaseDataContext dc = newDatabaseDataContext(dbConnection);

p = newCustomer


    CustomerID = "12345",

    CompanyName = "Microsoft",

    City = "London",

    CustomerName = "Pedram Rezaei"




INSERT INTO dbo.Customers

( [CustomerID], [CompanyName],
 [ContactName], [ContactTitle],
 [Address], [City], [Region],
 [PostalCode], [Country],
 [Phone], [Fax])

 'Pedram Rezaei',NULL,


Updating a customer entity

DatabaseDataContext dc = newDatabaseDataContext(dbConnection);

// get a customer from the database
Customer p = ...

p.City = "Southampton";


UPDATE dbo.Customers


 [CustomerID] ='12345',
 [CompanyName] ='Microsoft',

 [ContactName] ='Pedram Rezaei',
 [ContactTitle] ='',

 [Address] ='',
 [City] ='Southampton',

 [Region] ='',
 [PostalCode] ='',

 [Country] ='',
 [Phone] ='',

 [Fax] =''

WHERE ([CustomerID] ='12345')

LINQ is an amazing technology and its design makes it an incredibly flexible and extensible platform. As we saw, the LINQ’s provider model supports different types of providers. This article showed how it is possible to write a simple IQueryable LINQ provider that also supported write operations.

You can download the code for this article from here.

I would like to thank Stuart Leaks, Carl Nolan and Charlie Calvert for their help in reviewing this article.

Disclaimer* *: Any code or technique shown here is for illustrative purposes only and the referenced solution should only be seen as a throw-away prototype. The code has not been fully tested therefore it should not be assumed as production ready. You may find issues and bugs so please understand that the purpose of this exercise was not to write a fully working and bulletproof provider. The aim is to demonstrate the steps required to write a custom IQueryable LINQ provider.