January 2013
Volume 28 Number 01
The Working Programmer - .NET Collections: Getting Started with C5
By Ted Neward | January 2013
I have a confession to make.
By day, I work as a mild-mannered .NET developer for Neudesic LLC, a consulting firm. But by night, after the wife and my sons are asleep, I sneak out of the house, laptop in hand, and journey to my secret hideout (a Denny’s on 148th), and I … I write Java code.
Yes, folks, I live a double life as both a .NET developer and a Java—or, more accurately, a Java Virtual Machine (JVM)—developer. One of the interesting perks of living such a dual lifestyle is that I can see areas where the Microsoft .NET Framework has some great ideas that can be brought back to the JVM. One such idea was that of custom attributes, which the JVM adopted in Java5 (back in 2005 or so) with the name “annotations.” But the reverse is also true: The JVM did, in fact, get a few things right that the CLR and the .NET Base Class Library (BCL) didn’t (or at least didn’t get quite as right, if you’re feeling a little defensive). One such area lies at the heart of the .NET BCL: collections.
Collections: A Critique
Part of the flaw in .NET collections lies in the fact that the BCL team had to write the silly things twice: once for the .NET Framework 1.0/1.1 release, before generics were available, and again for the .NET Framework 2.0, after generics were part of the CLR, because collections without strongly typed versions are just kind of silly. This automatically meant that one of these was bound to be left to rot, essentially, in favor of whatever enhancements or additions to the library were to come along. (Java ducked this particular problem by essentially “replacing” the non-genericized versions with genericized versions, which was only possible because of the way Java did generics—which isn’t something I’m going to get into here.) And, aside from the enhancements that came via LINQ in Visual Studio 2008 and C# 3.0, the collections library never really got much love after the 2.0 release, which itself more or less just re-implemented the System.Collections classes into a new namespace (System.Collections.Generic, or SCG) of strongly typed versions.
More importantly, though, the design of the .NET collections seems to have been more focused on getting something practical and useful out into the wild as part of the 1.0 release, rather than trying to think deeply about the design of collections and how they might be extended. This was one area in which the .NET Framework really (I suspect unintentionally) paralleled the Java world. When Java 1.0 shipped, it included a set of basic, utilitarian collections. But they had a few design flaws (the most egregious of which was the decision that had the Stack class, a last-in-first-out collection, directly extend the Vector class, which was basically an ArrayList). After Java 1.1 shipped, a few engineers at Sun Microsystems worked hard on a rewrite of the collections classes—which became known as the Java Collections—and shipped it as a part of Java 1.2.
Anyway, the .NET Framework is past due for a revamp of its collection classes, ideally in a way that’s at least mostly compatible with the existing SCG classes. Fortunately, researchers at the IT University of Copenhagen in Denmark have produced a worthy successor and complement to the SCG classes: a library they call the Copenhagen Comprehensive Collection Classes for C#, or C5 for short.
C5 Logistics
To begin, C5 is found on the Web at itu.dk/research/c5 if you want to see the version history or get a link to a book (PDF) on C5, though the book is a few releases old. Or, alternatively, C5 is available through NuGet through the (by-now ubiquitous) Install-Package command, simply by typing “Install-Package C5.” Note that C5 is written to be available for both Visual Studio and Mono, and when NuGet installs the package, it will add references to both the C5.dll assembly as well as the C5Mono.dll assembly. These are redundant to each other, so delete the one you don’t want. To explore C5 collections through a series of exploration tests, I created a Visual C# Test Project and added C5 to that project. Beyond that, the only notable change to the code is two “using” statements, which the C5 documentation also assumes:
using SCG = System.Collections.Generic;
using C5;
The reason for the alias is simple: C5 “re-implements” a few interfaces and classes that are named the same in the SCG version, so aliasing the old stuff leaves it available to us but under a very short prefix (IList<T> is the C5 version, for example, whereas SCG.IList<T> is the “classic” version from SCG).
By the way, in case the lawyers ask, C5 is open sourced under an MIT license, so you’re far more able to modify or enhance some of the C5 classes than you would be under a GNU General Public License (GPL) or GNU Lesser General Public License (LGPL).
C5 Design Overview
Looking at the C5 design approach, it seems similar to the SCG style, in that collections are broken into two “levels”: an interface layer that describes the interface and behavior expected of a given collection, and an implementation layer that provides the actual backing code for the one or more interfaces desired. The SCG classes approach this idea, but in some cases they don’t follow through on it particularly well—for example, we don’t have any flexibility in terms of the implementation of the SortedSet<T> (meaning, the choice of array-based, linked-list-based or hash-based, each of which has different characteristics regarding performance of insertion, traversal and so on). In some cases the SCG classes are simply missing certain collection types—a circular queue, for example (in which, when the last item in the queue is traversed, the iteration “wraps around” to the front of the queue again), or a simple “bag” collection (which offers no functionality except to contain items, thus avoiding unnecessary overhead of sorting, indexing and so on).
Granted, to the average .NET developer, this doesn’t seem like much of a loss. But in many applications, as performance begins to become a central focus, choosing the right collection class to match the problem at hand becomes more critical. Is this a collection that will be established once and traversed frequently? Or is this a collection that will be added or removed frequently but traversed rarely? If this collection is at the heart of an application feature (or the app itself), the difference between these two could mean the difference between “Wow, this app screams!” and “Well, the users liked it, but just thought it was too slow.”
C5, then, holds as one of its core principles that developers should “code to interfaces, not implementations,” and, as such, offers more than a dozen different interfaces that describe what the underlying collection must provide. ICollection<T> is the base of it all, guaranteeing basic collection behavior, but from there we find IList<T>, IIndexed<T>, ISorted<T> and ISequenced<T>, just to start. Here’s the full list of interfaces, their relationships to other interfaces and their overall guarantees:
- An SCG.IEnumerable<T> can have its items enumerated. All collections and dictionaries are enumerable.
- An IDirectedEnumerable<T> is an enumerable that can be reversed, giving a backward enumerable that enumerates its items in the opposite order.
- An ICollectionValue<T> is a collection value. It doesn’t support modification, is enumerable, knows how many items it has and can copy them to an array.
- An IDirectedCollectionValue<T> is a collection value that can be reversed into a backward collection value.
- An IExtensible<T> is a collection to which items can be added.
- An IPriorityQueue<T> is an extensible whose least and greatest items can be found (and removed) efficiently.
- An ICollection<T> is an extensible from which items can also be removed.
- An ISequenced<T> is a collection whose items appear in a particular sequence (determined either by insertion order or item ordering).
- An IIndexed<T> is a sequenced collection whose items are accessible by index.
- An ISorted<T> is a sequenced collection in which items appear in increasing order; item comparisons determine the item sequence. It can efficiently find the predecessor or successor (in the collection) of a given item.
- An IIndexedSorted<T> is an indexed and sorted collection. It can efficiently determine how many items are greater than or equal to a given item x.
- An IPersistentSorted<T> is a sorted collection of which one can efficiently make a snapshot—that is, a read-only copy that will remain unaffected by updates to the original collection.
- An IQueue<T> is a first-in-first-out (FIFO) queue that in addition supports indexing.
- An IStack<T> is a last-in-first-out (LIFO) stack.
- An IList<T> is an indexed and therefore sequenced collection, where item order is determined by insertions and deletions. It derives from SCG.IList<T>.
In terms of the implementations, C5 has quite a few, including circular queues; array-backed as well as linked-list-backed lists, but also hashed-array lists and hashed-linked lists; wrapped arrays; sorted arrays; tree-based sets and bags; and more.
C5 Coding Style
Fortunately, C5 doesn’t require a significant shift in coding style—and, even better, it supports all of the LINQ operations (because it builds on top of the SCG interfaces, of which the LINQ extension methods key off), so in some cases you can drop in a C5 collection at construction time without changing any of the code around it. See Figure 1 for an example of this.
Figure 1 Getting Started with C5
// These are C5 IList and ArrayList, not SCG
IList<String> names = new ArrayList<String>();
names.AddAll(new String[] { "Hoover", "Roosevelt", "Truman",
"Eisenhower", "Kennedy" });
// Print list:
Assert.AreEqual("[ 0:Hoover, 1:Roosevelt, 2:Truman, 3:Eisenhower," +
" 4:Kennedy ]", names.ToString());
// Print item 1 ("Roosevelt") in the list
Assert.AreEqual("Roosevelt", names[1]);
Console.WriteLine(names[1]);
// Create a list view comprising post-WW2 presidents
IList<String> postWWII = names.View(2, 3);
// Print item 2 ("Kennedy") in the view
Assert.AreEqual("Kennedy", postWWII[2]);
// Enumerate and print the list view in reverse chronological order
Assert.AreEqual("{ Kennedy, Eisenhower, Truman }",
postWWII.Backwards().ToString());
Even without having ever looked at the C5 documentation, it’s pretty easy to figure out what’s happening in these examples.
C5 vs. SCG Collection Implementations
This is just the tip of the iceberg with respect to C5. In my next column I’ll look at some practical examples of using C5 instead of the SCG collection implementations—and some of the benefits that doing so gives us. I encourage you not to wait, though: Do the NuGet thing, pull C5 down and start exploring on your own—there’s plenty there for you to entertain yourself in the meantime.
Happy coding!
Ted Neward is an architectural consultant with Neudesic LLC. He has written more than 100 articles and is an author and coauthor of a dozen books, including “Professional F# 2.0” (Wrox, 2010). He’s an F# MVP and noted Java expert, and speaks at both Java and .NET conferences around the world. He consults and mentors regularly—reach him at ted@tedneward.com or Ted.Neward@neudesic.com if you’re interested in having him come work with your team. He blogs at blogs.tedneward.com and can be followed on Twitter at twitter.com/tedneward.
Thanks to the following technical expert for reviewing this article: Immo Landwerth