Immutability in C# Part Six: A Simple Binary Tree

OK, we've gotten pretty good at this by now. A straightforward implementation of an immutable generic binary tree requires little comment on the basics. A binary tree is either empty, or a value, left tree and right tree:

public interface IBinaryTree<V>
{
bool IsEmpty { get; }
V Value { get; }
IBinaryTree<V> Left { get; }
IBinaryTree<V> Right { get; }
}

And our implementation has, as always, a singleton empty tree. One minor point is that unlike previous data structures, where the structure itself knew how to build more of the same, this one does not. There's no obvious "insert" operation in a binary tree, so we'll just leave it up to the implementer. (A binary tree is essentially defined by its shape, not by the operations one can perform on it.) In this case, we'll just make a constructor:

public sealed class BinaryTree<V> : IBinaryTree<V>
{
private sealed class EmptyBinaryTree : IBinaryTree<V>
{
public bool IsEmpty { get { return true; } }
public IBinaryTree<V> Left { get { throw new Exception("Empty tree"); } }
public IBinaryTree<V> Right { get { throw new Exception("Empty tree"); } }
public V Value { get { throw new Exception("Empty tree"); } }
}
private static readonly EmptyBinaryTree empty = new EmptyBinaryTree();
public static IBinaryTree<V> Empty { get { return empty; } }

private readonly V value;
private readonly IBinaryTree<V> left;
private readonly IBinaryTree<V> right;

public bool IsEmpty { get { return false; } }
public V Value { get { return value; } }
public IBinaryTree<V> Left { get { return left; } }
public IBinaryTree<V> Right { get { return right; } }
public BinaryTree(V value, IBinaryTree<V> left, IBinaryTree<V> right)
{
this.value = value;
this.left = left ?? Empty;
this.right = right ?? Empty;
}
}

Note that another nice feature of immutable data structures is that it is impossible to accidentally (or deliberately!) create a tree which contains a cycle. In a mutable tree you could do something stupid like making the root a child of one of the leaves (or, for that matter, a child of itself!) Then you have to either live with the possibility that a user will create a cyclic "tree" and hope for the best (or, in other words, crash and/or hang) or you have to build yourself a cycle detector. Cycle detectors have potentially serious performance implications. It may be best to simply prevent the situation from arising in the first place.

A feature conspicuous by its absence from the above implementation is enumeration. Unlike a stack or a queue, there's no obvious best order in which to enumerate the members a binary tree. First off, there's the question of whether to enumerate in depth-first or breadth-first order, and then the matter of whether the parent comes before, after or between the children.

We could create extension methods which did various different enumerations on a binary tree. Here is a bad implementation of an "in order" traversal (depth first, parent comes between the children -- so-called because in a binary search tree, this enumerates the nodes in sorted order.)

public static IEnumerable<V> InOrder<V>(this IBinaryTree<V> tree)
{
// BAD IMPLEMENTATION, DO NOT DO THIS
if (tree.IsEmpty) yield break;
foreach(V v in tree.Left.InOrder()) yield return v;
yield return tree.Value;
foreach(V v in tree.Right.InOrder()) yield return v;
}

This seems perfectly straightforward and sensible, but this implementation is actually quite flawed. We can do a lot better. What is wrong with this implementation of enumeration, and how would you fix it? (Hint: start your analysis with the assumption that the tree has maximum height h and total number of nodes n.)

Next time: a better implementation of traversal, and then after that, some more complex kinds of immutable binary tree.

Comments

  • Anonymous
    December 18, 2007
    PingBack from http://geeklectures.info/2007/12/18/immutability-in-c-part-six-a-simple-binary-tree/

  • Anonymous
    December 18, 2007
    Thanks for looking at immutable trees, Eric. My very first concern looking at the immutable tree class as you've implemented it is that it seems like it must be built starting at the leaves and working back up to the root.  It seems to me like this would be very inconvenient in some cases, and is exactly the reason I asked about immutable trees in the comments to your post on the immutable stack. My feeling looking at InOrder is that the problem is if your tree has height h and nodes n, you're generating an iterator object for every node, and potentially will at some points have h "nested" iterators in memory at once.  While the way you have it written here is very simple, code-wise, a better solution might be to not use recursion and instead use a stack or something to keep track of your traversal path into the tree.

  • Anonymous
    December 18, 2007
    We'll be looking at ways to construct immutable trees not from the leaves first in later posts. Your intuition about InOrder is correct, as is your proposed sketch of a solution.

  • Anonymous
    December 18, 2007
    Ok.. first off... great series! Can't tell you how much I enjoy it... I am, however, having a really hard time understanding why an immutable tree cannot have any cycles. I can see how it happens to a mutable tree but why or how is this prevented in an immutable one? Man... I am lost... and I thought I was doing so well too...

  • Anonymous
    December 18, 2007
    meisinger: Because you have to pass the children in as arguments to the constructor, it's physically impossible to pass a parent in as a child because the parent has not been constructed yet... if you get my meaning.

  • Anonymous
    December 19, 2007
    This immutable tree implementation can trivially have cycles, because it accepts arbitrary IBinaryTree<V> instances and that's a public interface; I can provide my own (immutable) implementation of it that has cycles. One can't have cycles containing only BinaryTree instances, though.  It's possible in functional languages, too, like Haskell: "let ones = BinaryTree ones 1 ones"

  • Anonymous
    December 19, 2007
    Good point.  I meant "only using this implementation".

  • Anonymous
    December 19, 2007
    Okay, I started to reply to this thread, and it wound up too big for a comment so I wrote my own blog post instead. To summarize, I wanted to say that cycles CAN exist even using just this implementation, but only in the presence of unusual threading behavior. I don't know C# well enough to explain it there, so I write an explanation in Java (maybe someone here who understands C# better can comment on how well it applies). I wrote <a href="http://mcherm.com/blog/permalinks/1/immutable-trees-and-threading-evil-part-1">one blog entry</a> just describing the same thing built in Java (and simplified somewhat) and <a href="http://mcherm.com/blog/permalinks/1/immutable-trees-and-threading-evil-part-2">another blog entry</a> describing how you can get a cycle. I also explained the special trick you can use to avoid the problem in Java -- there's a good chance that it applies to the Eric's C# code as well, but I'm not expert enough to be sure.

  • Anonymous
    January 18, 2008
    For some reason, there's been a lot of buzz lately around immutability in C#. If you're interested in

  • Anonymous
    January 24, 2008
    It may not be possible to create a cycle in this implementation (notwithstanding Michael's posts; I haven't had time to go through them), but it is certainly possible to create an instance with a non-tree shape: BinaryTree<int> child = new BinaryTree<int>(0, BinaryTree<int>.Empty, BinaryTree<int>.Empty); BinaryTree<int> parent = new BinaryTree<int>(1, child, child); or, slightly less trivial, BinaryTree<int> child = new BinaryTree<int>(0, BinaryTree<int>.Empty, BinaryTree<int>.Empty); BinaryTree<int> left = new BinaryTree<int>(1, BinaryTree<int>.Empty, child); BinaryTree<int> right = new BinaryTree<int>(1, child, BinaryTree<int>.Empty); BinaryTree<int> root = new BinaryTree<int>(2, left, right); These examples also point out how cumbersome it is to refer to static fields in generic classes, and why you really should consider using null as a proxy for Empty during construction; the above code would be much easier to read. Also, a constructor for leaf nodes which only takes a value would reduce the burden somewhat.

  • Anonymous
    January 24, 2008
    Um... why are either of those not binary trees?  They certainly look like binary trees to me. They obey all of the rules of binaries trees. Just because we've been clever enough to compress two identical subtrees to have the same representation doesn't make this "not a tree shape".

  • Anonymous
    March 16, 2008
    Excuse my ignorance, but: Why it is immutable ? Because there is no insert ? I didnt catch. Please. MIB

  • Anonymous
    August 27, 2010
    @Eric, so you're saying that even though "child" is a single data structure in memory, multiple references to it in the tree data structure actually just mean "two identical subtrees"? Hmm, I guess if they're immutable, there's no way to tell the difference between that, and the two subtrees being the same subtree... Weird! So you really have to divorce the tree data structure from the abstract tree.