LINQ to DataSet - LinqDataView and indexes

Index usage

One of the features of the DataView that often gets overlooked is its ability to very quickly find a DataRow (or multiple DataRows) by an index value. The index is built on the sort criteria, so if you sort based on the last name column, the index will also be on the last name. In the 2.0 version of the DataSet, this was restricted to a column value. Now, with LinqDataView, this can be the result of a functional evaluation, that is, anything at all, which really opens up new possibilities.

Why an Index?

The nature of an index means that we can use it to find specific rows very quickly, based on the key of the index, or the value on which the sort occurs. We can see how this works with LinqDataView by performing a number of searches. Let’s say that we retrieve and cache a DataSet that contains our customer list, and we need to find the CustomerID from the Account number. The quick and dirty way to do this would be to have a method like the following.

private int FindCustomerID(string accountNum)

{

    var query = from cust in adventureWorksDS1.Customer

                where cust.AccountNumber == accountNum

                select cust;

    if (query.Count() > 0)

    {

  return query.First().CustomerID;

    }

    else

    {

        throw new Exception("Account Number not found!");

    }

}

This method works fine, and it might be ok if your load isn’t too high, and the results arrive quickly enough. However, you are repeatedly doing the same search so you can speed things up quite a bit by using an index. I am going to refer to the DataView as an index, but be aware that it’s not a special DataView, it’s just a regular old DataView (or LinqDataView).

static DataView index;

private int FindCustomerID(string accountNum)

{

    if (index == null)

    {

        var query = from cust in adventureWorksDS1.Customer

                    orderby cust.AccountNumber

                    select cust;

        index = query.AsDataView();

    }

    if (index.FindRows(accountNum).Count() > 0)

    {

        return index.FindRows(accountNum).First().Row.Field<int>("CustomerID");

    }

    else

    {

        throw new Exception("Account Number not found!");

    }

}

While this code is a lot longer, it will perform quite a bit better. Without an index, finding a particular row value will cost O(N) (as we might have to go through every DataRow) With the index, the cost to find the DataRow with the matching keys is O(log N). This is a huge gain, especially if your DataSet is fairly large.

How high is too high?

When should you use an index, as opposed to a much simpler query? Like most important questions in life – it depends. You will need to measure, and see if your code will speed up. For this simple example, the gains would likely be very small, and would likely not benefit from an index. However, if your sort criteria was something complex (long running calculations, etc), then the benefit shows up very quickly.

If you’d like to see more on this topic, please let me know. Stress and performance are complex beasts, and the usage of DataSet can have a big impact on your overall application.

Thanks,
Erick