The Astonishing S"Literal" String Type

One of the astonishing infelicities of the original language design was the unflagged overhead of the seemingly trivial failing of placing an S in front of a string literal targeted to a managed reference object. For example, given the following two System::String declarations, 

 

 

            String *ps1 = "hello";

      String *ps2 = S"goodbye";

 

here is the MSIL representation as seen through ildasm of the following two String declarations. Notice the astonishing performance difference.

 

// String *ps1 = "hello";

ldsflda valuetype $ArrayType$0xd61117dd

     modopt([Microsoft.VisualC]Microsoft.VisualC.IsConstModifier)

     '?A0xbdde7aca.unnamed-global-0'

newobj instance void [mscorlib]System.String::.ctor(int8*)

stloc.0

 

// String *ps2 = S"goodbye";

ldstr "goodbye"

stloc.0

That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S; or, to look at it another way, that’s a durn stern penalty for not doing so. [In addition, if S”goodbye” occurs 5 times, they are collapsed into a single shared instance.] And ignorance is not a mitigating defense! Using the default Visual Studio settings for a project, this compiles without any warning, as the following illustrates:

 

nettest - 0 error(s), 0 warning(s)

 

What’s perhaps equally remarkable is that in another common corner of the language, implicit value type boxing was explicitly not supported because it was felt that it would result in a false sense of security for the programmer who would not realize its run-time overhead.  For example,

 

            int ival;

      Object *po = ival; // error

Object *po = __box( ival ); // ok

Of course, these two design corners are not really at all the same – in fact, they seem to illustrate opposite design philosophies. In the one case, a trivial detail that is context sensitive silently causes a truly astonishing inflation of the run-time program. In the other case, there is no underlying gain or loss in the behavior of the program by having the explicit __box operator – only in the behavior of the programmer. It is a pedagogical design intended to teach the programmer about the nature of the CLR’s unified type system.

 

The solution in both cases is to make the behavior transparent. A reference type assigned with or initialized to a value type results in a boxing operation. This is as fundamental to the unified type system of the CLR as the copy constructor and copy assignment operator are to native C++. Ignore them at your peril. If you assign a literal string in a context where an S should be, the S is implicitly present.

 

What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?

 

void f(char*);

void f(String^);

f("ABC"); // calls f(char*)

The decision of the language design team is to drop the S and rather require the user to explicitly cast the literal string, as in

f(( String^ )"ABC");