Megosztás a következőn keresztül:


The Visitor pattern and multiple dispatch

Today I'll talk about multiple dispatch, a programming language feature that increases flexibility of method calls and eliminates the need for awkward pattern constructions, at the cost of some additional complexity in the dispatch algorithm.

The Visitor pattern

Those who have read the Gang of Four's seminal Design Patterns may recall the Visitor pattern. Its purpose is to provide new operations on a class or set of related classes without actually altering the classes involved. Design Patterns provides a real-world example in the document-view paradigm, but for simplicity I'll provide a simpler example here based on some notes by Corky Cartwright.

Say you wish to represent an arithmetic expression using integers and addition such as (1 + 5) + 2. In an object-oriented language, a natural way to do this is to create an abstract base class "Expression", and then create some related subclasses:

  • ConstantExpression: Represents a constant number, like 2.
  • SumExpression: Represents the result of adding two Expression objects.

The objects form a tree, where the leaves are ConstantExpression objects. To evaluate an expression represented in such a form, it suffices to create a virtual Evaluate() method which each subclass implements appropriately:

 abstract class Expression {
    public abstract int Evaluate();
}

class ConstantExpression : Expression {
    int constant;
    public override int Evaluate() {
        return constant;
    }
}

class SumExpression : Expression {
    Expression left, right;
    public override int Evaluate() {
        return left.Evaluate() + right.Evaluate();
    }
}

This may solve the problem if all you want to do is evaluate expressions, but there are all sorts of operations you can do on arithmetic expressions. Do we really want to add a virtual method for every one of them? We certainly don't want clients of the class to be forced to extend the classes just to add application-specific operations. The Visitor pattern solves this problem by replacing the Evaluate() method with an Accept() method that takes as a parameter a visitor object. The visitor object has a method for each type of expression, and Accept() invokes the appropriate one. Here's how we would evaluate expressions using the Visitor pattern:

 abstract class Expression {
    public abstract object Accept(Visitor v);
}

class ConstantExpression : Expression {
    public int constant;
    public override object Accept(Visitor v) {
        return v.VisitConstant(this);
    }
}

class SumExpression : Expression {
    public Expression left, right;
    public override object Accept(Visitor v) {
        return v.VisitSum(this);
    }
}

interface Visitor {
    object VisitConstant(ConstantExpression e);
    object VisitSum(SumExpression e);
}

class EvaluateVisitor : Visitor {
    public object VisitConstant(ConstantExpression e) {
        return e.constant;
    }
    public object VisitSum(SumExpression e) {
        return (int)e.left.Accept(this) + (int)e.right.Accept(this);
    }
}

We evaluate an expression e using (int)e.Accept(new EvaluateVisitor()), and adding new operations is easy; we just create new implementors of Visitor. Clients can create application-specific algorithms without touching the original classes. As an added benefit, the visitor class can keep state internal to the algorithm; for example, this visitor counts and stores the number of constants in an expression:

 class CountConstantsVisitor : Visitor {
    public int Count;
    public object VisitConstant(ConstantExpression e) {
        Count++;
        return null;
    }
    public object VisitSum(SumExpression e) {
        e.left.Accept(this);
        e.right.Accept(this);
        return null;
    }
}

But the wary designer will note that there is a maintainance problem here: if we add a new subclass of Expression, like ProductExpression, the Visitor interface and all classes implementing it have to be updated. If the Expression class rarely changes, this is fine, and may even be good: it forces the visitor classes to consider the new cases. If it's rapidly changing, however, we definitely have a problem.

Double dispatch

At this point we take a slight detour and consider another simple example. Polymorphism is the critical ability of an object-oriented method call to invoke different methods at runtime based on the type of the object, like Evaluate() in our first example. If you've ever dealt with operator overloading in C++ or C#, though, you may come up against the limits of polymorphism. If I write an operator+ for a class Complex for complex numbers, subclasses can override it in a polymorphic way - but only if the complex number is on the left. If the Complex object is on the right-hand side, there is no way to ensure the correct method is invoked polymorphically short of testing the type and invoking it yourself.

How cruel we are to the unprivileged right-hand sides! The built-in C operator + produces different types depending on both the type of its left and right argument; both a double plus a float and a float plus a double produce doubles. We want something similar for our own operators, the ability to polymorphically choose a method based on both the type of its left argument and its right argument. Then we could write code like this:

 Complex operator+(Complex c, int i) { ... }
Complex operator+(ComplexSubclass c, int i) { ... }
Complex operator+(int i, Complex c) { ... }
Complex operator+(int i, ComplexSubclass c) { ... }

The right method will be chosen at runtime based on the types of both sides; the most specific (most derived) type is always chosen. This is called double dispatch, because we select the method ("dispatch" the message) based on the type of two arguments.

Unfortunately, this new flexibility also introduces some new problems. Suppose we have these two methods:

 Complex operator+(ComplexSubclass s, Complex c) { ... }
Complex operator+(Complex c, ComplexSubclass s) { ... }

Now, suppose we try to add two objects of type ComplexSubclass. Neither of the above looks more appropriate than the other. There are various ways of dealing with this; for example, we can give the first argument precedence over the others, we can throw an error at runtime, or we can make the compiler require us to implement a method taking two ComplexSubclass objects. For this article, we'll sidestep the issue by saying we won't write any code that will invoke this dilemma.

Now, I'm sure you're wondering what any of this has to do with the Visitor pattern. The answer is that this feature allows us to drastically simplify our Visitor pattern example while retaining all of its benefits:

 abstract class Expression { }

class ConstantExpression : Expression {
    public int constant;
}

class SumExpression : Expression {
    public Expression left, right;
}

class EvaluateVisitor {
    public int Visit(ConstantExpression e) {
        return e.constant;
    }
    public int Visit(SumExpression e) {
        return Visit(e.left) + Visit(e.right);
    }
}

Now, this isn't valid C# of course, since C# doesn't support double dispatch, but you can see the effect it would have if it did: the Visitor interface is gone, the Accept() method is gone, and the calls in the last Visit() method are simplified. Best of all, it solves our dilemma of a rapidly changing Expression class hierarchy, because we can add a "default case" method that deals with any expression types we're not familiar with:

     public int Visit(Expression e) {
        throw new Exception("Unsupported type of expression"); // or whatever
    }

One handy trick you can do in Java and .NET languages is to have a method that examines the type of its argument and then uses reflection to pass it on to the correct overload, simulating polymorphism. This is much slower than native double dispatch support, but achieves much of the same effect.

If you've studied how polymorphic methods are dispatched, you're probably most familiar with the "vptr" and "vtable" model, where each object includes a reference to a type descriptor, which in turn contains a table of function pointers giving the right function to invoke for each method for an object of that type. With double dispatch this becomes trickier. One of the simplest strategies, and an effective one in practice, is to have the initial vtable dispatch the method to a "stub" method, whose job is to perform another vtable dispatch on the second argument. Because most methods don't require double dispatch, this avoids imposing a penalty on ordinary single dispatch methods.

Multiple dispatch

But why stop at two arguments? Why not choose the method based on all the arguments? Although increasingly less useful, there are legitimate applications of methods that dispatch on 3, 4, or more arguments. This form of dispatch, called multiple dispatch, is directly supported by languages such as Dylan, Cecil, and CLOS (the Common Lisp Object System). Not only does this feature ease design of an initial system, but it also greatly increase the ability to extend it in multiple directions in an object-oriented way. Try experimenting with some of them and going through their tutorials; even in languages that don't directly support multiple dispatch, it helps to be able to recognise a situation where it would apply and simulate it.

Thanks for reading, everyone.

Comments

  • Anonymous
    August 30, 2005
    So why stop there? Can you show the example of the Expression.Visit method that does the right thing, such as returning something ;) Isn't it possible to have Visit return an Expression and have Expression defined using partial classes in CLR 2.0 allowing you to extend the operator set as you add expression types?

  • Anonymous
    August 30, 2005
    Oops, in that last example Visit() was supposed to return an int. I fixed that. I'm afraid I'm not totally clear on what you're asking. You can certainly stick to virtual methods for each operation, and use partial classes to add each one, but this is just poor design, because it doesn't separate the object from application-specific operations on that object effectively. In particular, it doesn't provide a way of having data which is private to a particular operation, or of processing two or more different type hierarchies using a single algorithm. Thanks for your comments, Marc.

  • Anonymous
    September 02, 2005
    Hi Derrick

    I'm working on exactly this kind of problem (building an expression system in C#) and so I ran into this exact problem. So I'm very interested in your thoughts on this.

    Now I probably understood it wrong but I don't quite see how this double dispatch method actually solves the problem. In the end you still have to create a method for each new expression that is created so I'm wondering what is the benefit (in this case) of the double dispatch?

    Anyways thanks a lot for your post (this one and all the others) I'm learning heaps here.

    Regards Petrik

  • Anonymous
    September 04, 2005
    Hi Petrik. That's a good question. With respect to the problem of creating new expression types, the advantage of the double dispatch is that it can match the second argument to an abstract base class, in this case Expression. Using this, a particular operation can pay attention to only certain types of expressions, and do something generic with all other types. With the Visitor pattern, it is compelled to implement a method for all subclasses. For example, my "CountConstantsVisitor", if written with double dispatch, would only require one method taking a ConstantExpression and one method taking a generic Expression and doing nothing. I hope this is a bit clearer, sorry about that.

  • Anonymous
    September 05, 2005
    Hi Derrick

    Thanks for your explanation, I didn't think about it in that way.

    Unfortunately C# doesn't allow for double dispatch (yet) so I guess I'll just have to make do without :-(.

    Anyway thanks heaps for your articles they are all very informative.

    Regards Petrik

  • Anonymous
    September 07, 2005
    Actually, Petrik, there is a way to implement double dispatch in C# using reflection, as described at http://www.dotnetdevs.com/articles/ReflectionVisitor.aspx . The only catch is that it's inefficient and requires a caching scheme to perform effectively. Whether to use this or the Visitor pattern is a tradeoff and somewhat a matter of personal taste.

  • Anonymous
    January 19, 2006
    smells more like overloading to me. in general if your passing a bunch of params around you have mucked something up.

  • Anonymous
    January 20, 2006
    Me: Thanks for commenting. Actually multiple dispatch, like the polymorphism that it generalizes, is not at all like overloading. Overloading is based solely on static compile-time types, while polymorphism and multiple dispatch is based on dynamic run-time types. This makes it much more powerful - none of the above examples would work with simple overloading. As for passing a bunch of parameters around, well, you're already implicitly passing a "this" reference to all of your methods in C++ or Java - whether arguments are passed implicitly or explicitly does not affect any complexity other than visual complexity, and the visual complexity of switching on the type of an argument is far worse.

  • Anonymous
    May 14, 2007
    A language design question was posted to the Microsoft internal C# discussion group this morning: " Why

  • Anonymous
    May 17, 2007
    me: multimethods AKA multipled dispatch is ad-hoc polymorphism AKA overloading but overload resolution is dynamic where as overloading as you may know it from statically typed languages such as C# and C++ overload resolution is static. Also note that method overriding is just a special case of overloading.

  • Anonymous
    May 27, 2007
    Hi, you might want to check out http://www.codeproject.com/cs/design/DynamicActionDispatcher.asp Without knowing about this article of yours here, I wrote that implementation of double dispatch some time ago. Now I stumbled across your article and if you could see me, you'd see a big fat smile ;-) Let me know what you think about the implementation. Cheers, Max Hajek Vienna, Austra

  • Anonymous
    January 09, 2008
    Had a look at your ReflectionVisitor example. Quite slick! However, you could achieve the same functionality with delegates and not use reflection at all. Have you tried this approach?

  • Anonymous
    January 31, 2008
    Jeremy Miller recently had a post that mentioned " double dispatch ", and I had to delve down into the

  • Anonymous
    June 09, 2009
    PingBack from http://quickdietsite.info/story.php?id=6415