July 2009
Volume 24 Number 07
CLR Inside Out - Building Tuple
By Matt Ellis | July 2009
This column is based on a prerelease version of the Microsoft .NET Framework 4. Details are subject to change.
Contents
Existing Tuple Types
Interoperability
Using Tuple from C#
Reference or Value Type?
Tuples of Arbitrary Length
Interfaces Implemented
Structural Equality, Comparison, and Equivalence
IStructualEquatable and IStructuralComparable
Partial Equivalence Relations
Worth the Effort
The upcoming 4.0 release of Microsoft .NET Framework introduces a new type called System.Tuple. System.Tuple is a fixed-size collection of heterogeneously typed data. Like an array, a tuple has a fixed size that can't be changed once it has been created. Unlike an array, each element in a tuple may be a different type, and a tuple is able to guarantee strong typing for each element. Consider trying to use an array to store a string and integer value together, as in the following:
static void Main(string[] args)
{
object[] t = new object[2];
t[0] = "Hello";
t[1] = 4;
PrintStringAndInt((string)t[0], (int)t[1]);
}
static void PrintStringAndInt(string s, int i)
{
Console.WriteLine("{0} {1}", s, i);
}
The above code is both awkward to read and dangerous from a type safety point of view. We have no guarantee that the elements of the tuple are the correct type when we go to use them. What if, instead of an integer, we put a string in the second element in our tuple, as follows:
static void Main(string[] args)
{
object[] t = new object[2];
t[0] = "Hello";
t[1] = "World";
PrintStringAndInt((string)t[0], (int)t[1]);
}
static void PrintStringAndInt(string s, int i)
{
Console.WriteLine("{0} {1}", s, i);
}
This code would compile, but at run time, the call to PrintStringAndInt would fail with an InvalidCastException when we tried to cast the string "World" to an integer. Let's now take a look at what the code might look like with a tuple type:
static void Main(string[] args)
{
Tuple<string, int> t = new Tuple<string, int>("Hello", 4);
PrintStringAndInt(t.Item1, t.Item2);
}
static void PrintStringAndInt(string s, int i)
{
Console.WriteLine("{0} {1}", s, i);
}
With the tuple type, we've been able to remove the cast operators from our code, giving us better error checking at compile time. When constructing and using elements from the tuple, the compiler will ensure our types are correct.
As this example shows, it's obvious that a tuple is a useful type to have in the Microsoft .NET Framework. At first, it might seem like a very easy enhancement. However, we'll discuss some of the interesting design challenges that the team faced while designing System.Tuple. We'll also explore more about what we added to the Microsoft .NET Framework and why these additions were made.
Existing Tuple Types
There is already one example of a tuple floating around the Microsoft .NET Framework, in the System.Collections.Generic namespace: KeyValuePair. While KeyValuePair<TKey, TValue> can be thought of as the same as Tuple<T1, T2>, since they are both types that hold two things, KeyValuePair feels different from Tuple because it evokes a relationship between the two values it stores (and with good reason, as it supports the Dictionary class). Furthermore, tuples can be arbitrarily sized, whereas KeyValuePair holds only two things: a key and a value.
Within the framework, many teams created, each for its own use, a private version of a tuple but didn't share their versions across teams. The Base Class Libraries (BCL) team looked at all of these uses when designing their version of tuple. After other teams heard that the BCL team would be adding a tuple in Microsoft .NET Framework 4, other teams felt a strong desire to start using tuples in their code, too. The Managed Extensibility Framework (MEF) team even took a look at the draft of one of BCL's specs before implementing a version of tuple in their code. They later released this version as part of a consumer technology preview, with a supporting comment that it was a temporary implementation until tuples were added to the Microsoft .NET Framework!
Interoperability
One of the biggest set of customers inside Microsoft for tuple was the language teams themselves. While C# and VB.NET languages do not have the concept of a tuple as part of the core language, it is a common feature of many functional languages. When targeting such languages to the Micrososft.NET Framework, language developers had to define a managed representation of a tuple, which leads to unnecessary duplication. One language suffering that problem was F#, which previously had defined its own tuple type in FSharp.Core.dll but will now use the tuple added in Microsoft .NET Framework 4.
In addition to enabling the removal of duplicate types, having a common type ensures that it is easy to call functions across language boundaries. Consider what would happen if C# had added a tuple type (as well as new syntax to support it) to the language, but didn't use the same managed representation as F#. If this were the case, any time you wanted to call a method from an F# assembly that took a tuple as an argument, you would be unable to use the normal C# syntax for a tuple or pass an existing C# tuple. Instead, you would be forced to convert your C# tuple to an F# tuple, and then call the method. It's our hope that by providing a common tuple type, we'll make interoperability between managed languages with tuples that much easier.
Using Tuple from C#
While some languages like F# have special syntax for tuples, you can use the new common tuple type from any language. Revisiting the first example, we can see that while useful, tuples can be overly verbose in languages without syntax for a tuple:
class Program
{
static void Main(string[] args)
{
Tuple<string, int> t = new Tuple<string, int>("Hello", 4);
PrintStringAndInt(t.Item1, t.Item2);
}
static void PrintStringAndInt(string s, int i)
{
Console.WriteLine("{0} {1}", s, i);
}
}
Using the var keyword from C# 3.0, we can remove the type signature on the tuple variable, which allows for somewhat more readable code.
var t = new Tuple<string, int>("Hello", 4);
We've also added some factory methods to a static Tuple class which makes it easier to build tuples in a language that supports type inference, like C#.
var t = Tuple.Create("Hello", 4);
Don't be fooled by the apparent lack of types here. The type of t is still Tuple<string, int> and the compiler won't allow you to treat the elements as different types.
Now that we've seen how the tuple type works, we can take a look at the design behind the type itself.
Reference or Value Type?
At first glance, there isn't much to the tuple type, and it seems like something that could be designed and implemented over a long weekend. However, looks are often deceiving, and there were some very interesting design decisions that needed to be made during its development.
The first major decision was whether to treat tuples either as a reference or value type. Since they are immutable any time you want to change the values of a tuple, you have to create a new one. If they are reference types, this means there can be lots of garbage generated if you are changing elements in a tuple in a tight loop. F# tuples were reference types, but there was a feeling from the team that they could realize a performance improvement if two, and perhaps three, element tuples were value types instead. Some teams that had created internal tuples had used value instead of reference types, because their scenarios were very sensitive to creating lots of managed objects. They found that using a value type gave them better performance. In our first draft of the tuple specification, we kept the two-, three-, and four-element tuples as value types, with the rest being reference types. However, during a design meeting that included representatives from other languages it was decided that this "split" design would be confusing, due to the slightly different semantics between the two types. Consistency in behavior and design was determined to be of higher priority than potential performance increases. Based on this input, we changed the design so that all tuples are reference types, although we asked the F# team to do some performance investigation to see if it experienced a speedup when using a value type for some sizes of tuples. It had a good way to test this, since its compiler, written in F#, was a good example of a large program that used tuples in a variety of scenarios. In the end, the F# team found that it did not get a performance improvement when some tuples were value types instead of reference types. This made us feel better about our decision to use reference types for tuple.
Tuples of Arbitrary Length
Theoretically, a tuple can contain as many elements as needed, but we were able to create only a finite number of classes to represent tuples. This raised two major questions: how many generic parameters should the largest tuple have, and how should the larger tuples be encoded? Deciding on how many generic parameters to have in the largest tuple ended up being somewhat arbitrary. Since Microsoft .NET Framework 4 introduces new versions of the Action and Func delegates that take up to eight generic parameters, we chose to have System.Tuple work the same way. One nice feature of this design is that there's a correspondence between these two types. We could add an Apply method to each tuple that takes a correspondingly sized Action or Func of the same type and pass each element from the tuple as an argument to the delegate. While we ultimately didn't add this in order to keep the surface area of the type clean, it's something that could be added in a later release or with extension methods. There is certainly a nice symmetry in the number of generic parameters that belong to Tuple, Action, and Func. An aside: After System.Tuple was finished being designed, the owners of Action and Func created more instances of each type; they went up to 16 generic parameters. Ultimately, we decided not to follow suit: the decision on sizing was somewhat arbitrary to begin with, and we didn't feel the time we would spend adding eight more tuple types would be worth it.
Once we answered our first question, we still had to figure out a way to represent tuples with more than eight elements. We decided that the last element of an eight-element tuple would be called "Rest" and we would require it to be a tuple. The elements of this tuple would be the eighth, ninth, and so on, elements of the overall tuple. Therefore, users who wanted an eight-element tuple would create two instances of the tuple classes. The first would be the eight-element tuple, which would contain the first seven items of the tuple, and the second would be a one-element tuple that held the last tuple. In C#, it might look like this:
Tuple.Create(1, 2, 3, 4, 5, 6, 7, Tuple.Create(8));
In languages like F# that already have concrete syntax for tuples, the compiler handles this encoding for you.
For languages that provide syntax for interacting with tuples, the names of the properties used to access each element aren't very interesting, since they are shielded from the developer. This is a very important question for languages like C#, where you have to interact with the type directly. We started with the idea of using Item1, Item2, and so on as the property names for the elements, but we received some interesting feedback from our framework design reviewers. They said that while these property names make sense when someone thinks about them at first glance, it gives the feeling that the type was auto-generated, instead of designed. They suggested that we name the properties First, Second, and so on, which felt more accessible. Ultimately, we rejected this feedback for a few reasons: first, we liked the experience of being able to change the element you access by changing one character of the property name; second, we felt that the English names were difficult to use (typing Fourth or Sixth is more work than Item4 or Item6) and would lead to a very weird IntelliSense experience, since property names would be alphabetized instead of showing up in numerical order.
Interfaces Implemented
System.Tuple implements very few interfaces and no generic interfaces. The F# team was very concerned about tuple being a very lightweight type, because it uses so many different generic instantiations of the type across its product. If Tuple were to implement a large number of generic interfaces, they felt it would have an impact on their working set and NGen image size. In light of this argument, we were unable to find compelling reasons for Tuple to implement interfaces like IEquatable<T> and IComparable<T>, even though it overrides Equals and implements IComparable.
Structural Equality, Comparison, and Equivalence
The most interesting challenge we faced when designing Tuple was to figure out how to support the following:
- structural equality
- structural comparison
- partial equivalence relations
We did as much design work around these concepts as we did for Tuple itself. Structural equality and comparison relate to what Equals means for a type like Tuple that simply holds other data.
In F#, equality over tuples and arrays is structural. This means that two arrays or tuples are equal if all their elements are the same. This differs from C#. By default, the contents of arrays and tuples don't matter for equality. It is the location in memory that is important.
Since this idea of structural equality and comparison is already part of the F# specification, the F# team had already come up with a partial solution to the problem. However, it applied only to the types that the team created. Since it also needed structural equality over arrays, the compiler generated special code to test if it was doing equality comparison on an array. If so, it would do a structural comparison instead of just calling the Equals method on Array. The design team iterated on a few different designs on how to solve these sorts of structural equality and comparison problems. It settled on creating a few interfaces that structural types needed to implement.
IStructualEquatable and IStructuralComparable
IStructualEquatable and IStructuralComparable interfaces provide a way for a type to opt in to structural equality or comparison. They also provide a way for a comparer to be used for each element in the object. Using these interfaces, as well as a specially defined comparer, we can have deep equality over tuples and arrays when it is required, but we don't have to force the semantics on all users of the type. The design is deceptively simple:
public interface IStructuralComparable
{
Int32 CompareTo(Object other, IComparer comparer);
}
public interface IStructuralEquatable
{
Boolean Equals(Object other, IEqualityComparer comparer);
Int32 GetHashCode(IEqualityComparer comparer);
}
These interfaces work by separating the act of iterating over elements in the structural type, for which the implementer of the interface is responsible, from the actual comparison, for which the caller is responsible. Comparers can decide if they want to provide a deep structural equality (by recursively calling the IStructualEquatable or IStructualComparable methods, if the elements implement them), a shallow one (by not doing so), or something completely different. Both Tuple and Array now implement both of these interfaces explicitly.
Partial Equivalence Relations
Tuple also needed to support the semantics of partial equivalence relations. An example of a partial equivalence relation in the Microsoft .NET Framework is the relationship between NaN and other floating-point numbers. For example: NaN<NaN is false, but the same holds for NaN>NaN and NaN == NaN and NaN != NaN. This is due to NaN being fundamentally incomparable, since it does not represent any number. While we are able to encode this sort of relationship with operators, the same does not hold for the CompareTo method on IComparable. This is because there is no value that CompareTo can return that signals the two values are incomparable.
F# requires that the structural equality on tuples also works with partial equivalence relationships. Therefore in F#, [NaN, NaN] == [NaN, NaN] is false, but so is [NaN, NaN] != [NaN, NaN].
Our first tentative solution was to have overloaded operators on Tuple. This worked by using operators on the underlying types, if they existed, and by falling back to Equals or CompareTo, if they did not. A second option was to create a new interface like IComparable but with a different return type so that it could signal cases where things were not comparable. Ultimately, we decided that we would hold off on building something like this until we saw more examples of partial equivalence being needed across the Microsoft .NET Framework. Instead, we recommended that F# implement this sort of logic in the IComparer and IEqualityComparer methods that they passed into the IStructrual variants of CompareTo and Equals methods by detecting these cases and having some sort of out-of-band signaling mechanism when it encountered NaNs.
Worth the Effort
Though it took a great deal more design iteration than anyone expected, we were able to create a tuple type that we feel is flexible enough for use in a variety of languages, regardless of syntactic support for tuples. At the same time, we've created interfaces that help describe the important concepts of structural equality and comparison, which have value in the Microsoft .NET Framework outside of tuple itself.
Over the past few months, the F# team has updated its compiler to use System.Tuple as the underlying type for all F# tuples. This ensures that we can start building toward a common tuple type across the Microsoft .NET ecosystem. In addition to this exciting development, tuple was demoed at this year's Professional Developers Conference as a new feature for Microsoft .NET Framework 4 and received much applause from the crowd. Watching the video and seeing how excited developers are to start using tuples has made all the time spent on this deceptively simple feature all the more worthwhile.
Post your questions and comments on the CLR Team Blog at https://blogs.msdn.com/clrteam/archive/tags/CLR+Inside+Out/default.aspx.
Matt Ellis is a Software Design Engineer on the Base Class Libraries team responsible for diagnostics, isolated storage, and other little features like Tuple. When he's not thinking about frameworks or programming language design, he spends his time with his wife Victoria and their two dogs, Nibbler and Snoopy.