Algorithms and .Net collections

Algorithms and .Net collections

Recently I was asked by a colleague about an InvalidOperation exception with the message:

Collection was modified; enumeration operation may not execute.

It would seem that my colleague like many of us expected code similar to this to work

foreach (Item s in collection)
{
if (s.expired == true)
{
collection.Remove(s);
}
}

And yet it gave an Invalid operation exception with the text: "Collection was modified; enumeration operation may not execute."

Updating collections of one sort or another is probably the most common thing that I do in my code, I seem to be always adding to and removing from collections in .NetFramework 1.0 and 1.1 System.Collections.ArrayList and System.Collections.HashTable were my closest allies in keeping my application state under control. And yet I still ran into the dreaded InvalidOperationException all of the time.

It turns out that there are plenty of ways to fix code with this exception probably the one involving the least typing is ToArray()

foreach (Item s in collection .ToArray() )
{
if (s.expired == true)
{
collection.Remove(s);
}
}

If the collection is a reasonable size then this is probably as good a solution as you can get, however, if the collection is large relative to the size of edits performed then ToArray() is not the way to go, the reason for this is that a reasonable implementation of ToArray() would look like this::

Item[] array = new Item[collection.Count];
int i=0;
foreach (Item item in collection)
{
array[i++] = item;
}
return array;

Clearly this would mean that the fix I proposed above would entail two complete enumerations through our data which would certainly make our processors cache put in some effort, and it also bloats our applications working set by having two copies of the collection in memory (on the other hand the second copy will be an array which mitigates it's presence somewhat).

When the collection is large compared to the number of updates being implemented it is probably best to create a seperate list to hold the edits and then enumerate them like so:

List<Item> updates = new List<Item>();
foreach (Item s in collection)
{
if (s.expired == true)
{
updates .Add(s);
}
}

foreach (Item s in updates)
{
collection.Remove(s);
}

Recently I have been thinking about algorithms and .Net collections. In this case I am using algorithm in the C++ STL (Standard Template Library) sense to mean some code that applies to all members of a collection, for instance a RemoveItems algorithm could be used for the example above to encapsulate the code that updates the collection away from my app code to a utility class somewhere. The only problem is that I may want to remove items from a collection for lots of different reasons not just because the expired field is set to true, so how then can a developer parametize the decision for an algorithm, it turns out that a delegate is ideal for this type of customization.

My requirements for a RemoveItems algorithm are as follows:

  1. Work with all .Net collections
  2. Parameterisable decision point
  3. Not use a two pass enumeration scheme

The code I finally came up with looked like:

public static class RemoveItems<T>
{
public delegate bool Decide(T item);
public static void Execute(ICollection<T> collection, Decide decide)
{
List<T> removed= new List<T> ();
foreach (T item in collection)
{
if(decide(item))
removed.Add(item);
}
foreach(T item in removed)
{
collection.Remove(item);
}
removed.Clear();
}
}

I can now rewrite my function that needs to eliminate expired items to call my algorithm like so:

RemoveItems<Item>.Execute(collection, delegate(Item item) { return item.expired == true; });

Note that I am using the new anonymous method feature of C# to allow me to express the decide method inline in my caller code.

The complete example program looks like this:

#region Using directives

using System;

using System.Collections;

using System.Collections.Generic;

using System.Text;

#endregion

namespace UpdateIteration

{

    public static class RemoveItems<T>

    {

        public delegate bool Decide(T item);

        public static void Execute(ICollection<T> collection, Decide decide)

        {

            List<T> removed = new List<T>();

            foreach (T item in collection)

            {

                if(decide(item))

                    removed.Add(item);

            }

            foreach (T item in removed)

            {

                collection.Remove(item);

            }

            removed.Clear();

        }

    }

    class Program

    {

        class Item

        {

            public Item(string value, bool expired) { this.value = value; this.expired = expired; }

            public string value;

            public bool expired = false;

        }

        static void Main(string[] args)

        {

            List<Item> collection = new List<Item>();

            collection.Add(new Item("one", true));

            collection.Add(new Item("two", false));

            collection.Add(new Item("three", true));

            collection.Add(new Item("four", false));

            collection.Add(new Item("five", true));

            collection.Add(new Item("six", false));

            collection.Add(new Item("seven", true));

            collection.Add(new Item("eight", false));

            collection.Add(new Item("nine", true));

 

            RemoveItems<Item>.Execute(collection, delegate(Item item) { return item.expired == true; });

        }

    }

}

I have noticed that using delegates to parametize these types of algorithms is allowing me to write much more reusable and readable code.  You should take a look at the anonymous methods features and generic collections in the Beta 1 and soon the Beta 2 whidbey compilers and see how they affect your own programming style.