Condividi tramite


Non-nullable types

If you write programs in C, C++, Java, or C#, you've gotten used to having the null value around. The null value is a special reserved reference (or pointer) value indicating that a reference does not refer to any object. It's useful for constructing a variety of data structures, but it's also a notoriously common source of bugs, because occasionally values that "should never" be null turn out to be, unexpectedly. On modern machines this type of runtime error can be detected quickly if it occurs, typically by using the value zero for null and unmapping the first page of memory, but it would be much nicer to prevent this type at error at compile-time.

Impossible, you say? Not at all. In fact, functional languages in the ML family such as Standard ML, Ocaml, and Haskell have always made a distinction between nullable types (types that can have a reserved null value, such as "int option") and non-nullable types (such as "int"). There's no reason for every type to include, by default, a feature you may not use. There's nothing magic about this; it doesn't make a program any more difficult to type check if you say some values can be null and other values can't.

We could make a similar distinction in a language such as C#. In fact, someone already has. One of the most exciting research projects at Microsoft is an extension of C# called Spec#. Spec# has a lot of truly cutting-edge features that I won't get into right now, but the one simple feature that I do want to mention is non-nullable types. These allow you to affix an exclamation mark (!) onto any reference type, yielding a new type that cannot be null but is otherwise the same. For example, consider this simple string comparison method:

         public int StringCompare(string left, string right)
        {
            for(int i=0; i<Math.Min(left.Length, right.Length); i++)
            {
                if (left[i] != right[i])
                {
                    return left[i] - right[i];
                }
            }
            return right.Length - left.Length;
        }
    
        public int Foo(string name)
        {
            return StringCompare(name, name + "bar");
        }

If I enter this into a Spec# buffer, it underlines the references to "left" and "right" with red lines to indicate that I may be dereferencing a null value, which is a compile-time error (not just a warning). The solution is to change the signature of StringCompare to the following, indicating that it only accepts non-null strings:

         public int StringCompare(string! left, string! right)

If I do this, however, I get an error in Foo() that it's passing a possibly-null string where a non-null string is expected (a type error). Supposing I want Foo to accept null values, I can add the appropriate null check and the error goes away:

         public int Foo(string name)
        {
            if (name == null) return 0;
            return StringCompare(name, name + "bar");
        }

Even more importantly, Spec# ships with improved signatures for a large number of .NET Framework methods. You no longer have to worry about NullReferenceExceptions occurring when you pass null to a Framework method that doesn't accept it - you find out at compile-time. They generated most of these signatures automatically by reflecting on the CLR and looking for an argument being tested for null followed by a throw of NullReferenceException, but ideally the CLR team would include the proper types from the beginning.

Non-nullable types are also present in other extension languages and tools. Cyclone, the safe C dialect, adds non-nullable pointers to C (along with several other types of pointers, see Pointers). ESC/Java adds non-nullable pointers, along with many other extensions, to Java (see non-null pragma). Microsoft has a well-known internal static checking tool called PREfix which allows the specification of non-null types in C++ code using the SAL_notnull SpecStrings annotation. This feature is so easy to add and so useful that just about every new language does so.

So why have we continued to battle null values for all these years when there's an easy-to-implement, well-known compile-time solution? Got me, but I wouldn't mind seeing this feature in C# 3.0.

Comments

  • Anonymous
    October 10, 2005
    See Cyrus's blog entries from a few months back on why this is a good idea, but also why it's hard.

    My personal opinion is that it's impossible to do in C# because you can't do it right without breaking backward compatibility.

    Consider the simple example:

    public int GetLength(string s) {
    return s.Length;
    }

    This is valid C# code today; backward compatibility dictates that it must continue to be valid and do the same thing it does today. But it instantly defeats the whole point of compiletime null checking because you have a possibly-null object being dereferenced right in front of your face.

    It's possible to arrange that you can't pass a null value to a function that specifically requests a string!-typed argument, but that doesn't help much if every single "." in your program is still a potential landmine...
  • Anonymous
    October 10, 2005
    I think this would be a great addition to C#. Parameter modifiers that affect the actual type signature already exist in the "ref", "out" and "params" keywords. That last one is probably the closest in nature to the non-null type modifier because it requires compiler support to implement properly. Something like the following would be in keeping with C#'s nature:

    public void Method( required string name ) {
    }

    The compiler could then use static checks to determine if Method or some other override (that doesn't include the "required" constraint) should be called. I definitely like putting basic constraints like this in the method signature itself. It also saves a lot of checking against null. Since the constraint modifies the signature of the method it would be easy to add these to current classes, while maintaing back-compat. Would also be cool if you could call:

    obj.Method( required param )

    and have the compiler perform the null checks for you. Makes calling the constained method from non-constrained overloads very easy.


    -Lonnie
  • Anonymous
    October 10, 2005
    Thanks for your comments, Stuart. Backward compatibility is a difficult issue. The way Spec# in particular deals with it is to allow you to compile old code as C# but create a separate interface with updated signatures for these methods that is used by new code. They call these out-of-band specifications, and this is how they added non-null argument types to Framework methods.

    One alternative solution more suitable for direct inclusion in C# is to create three types of references: non-nullable, nullable, and unchecked. The original syntax would produce unchecked references, which are nullable and can also be implicitly narrowed to non-nullable values. I'm sure there are other solutions available as well.
  • Anonymous
    October 10, 2005
    Better yet, the "required" constraint could just force the compiler to insert checks against null to all the required parameters, throwing ArgumentNullException as appropriate. The "required" constraint would be visible in the editor through syntax assist, but would not really modify the type signature of the method because the null checks are performed by the method itself.
  • Anonymous
    October 13, 2005
    The idea of supporting three types of reference is an interesting one; presumably they'd be string! (non-nullable) string? (explicitly nullable) and just plain string (unchecked).

    You could then arrange for any use of an unchecked type to produce a warning, which developers could choose to configure as an error instead using existing warn-as-error compiler settings.

    For value types the "!" would be redundant, but it seems to me it would be a good idea to allow it for consistency, and also support producing a warning if it's not used. This would actually make the current inconsistent behavior of nullable value types more consistent - currently you can't call methods on the underlying type directly on a nullable, but that would be the case on an explicitly-nullable reference too.