The Working Programmer - Multiparadigmatic .NET, Part 6: Reflective Metaprogramming

By Ted Neward | March 2011

The marathon continues. In my previous installment we discussed automatic metaprogramming, and by this point the ideas of commonality and variability should be taking on a familiar feel. I’m now full-on talking about metaprogramming—the idea of programs writing programs.

Last month I examined one of the more familiar approaches to metaprogramming: automatic metaprogramming, more commonly known as code generation. In an automatic metaprogramming scenario, developers write programs that describe the “things” to be generated. Usually this is done with the aid of command-line parameters or other inputs such as relational database schemas, XSD files or even Web Services Description Language (WSDL) documents.

Because code generation is essentially “just as if” the code were written by human hands, variability can come at any point inside the code. Data types, methods, inheritance ... all of it can be varied according to need. The drawback, of course, is twofold. First, too much variability can render the generated code (and, more often than not, the templates by which the code is generated) difficult to understand. Second, the generated artifacts are essentially uneditable unless the code generation is somehow partitioned away through the use of partial classes or code generation is no longer necessary.

Fortunately, C# and Visual Basic offer more metaprogrammatic tactics than just automatic metaprogramming, and have done so since the earliest days of the Microsoft .NET Framework.

Persistent Problems

A recurring problem in the object-oriented environment is that of object-to-relational persistence—also known as object-to-XML conversion or, in the more modern Web 2.0 world, object-to-JSON transformation. Despite developers’ best efforts, it seems inevitable that object models need to escape the CLR somehow and either move across the network or move onto disk and back again. And herein lies the problem: the previous modes of design—procedural and object-oriented—don’t offer good solutions for this dilemma.

Consider a canonical, simplified representation of a human being modeled in code:

class Person {
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public int Age { get; set; }
  public Person(string fn, string ln, int a) {
    FirstName = fn; LastName = ln; Age = a;
  }
}

Persisting instances of this object, as it turns out, is not difficult, particularly because the properties (in their simplest form) correspond on a one-to-one basis with the columns of a relational table and the properties are publicly accessible. You could write a procedure to take instances of Person, extract the bits of data, inject those bits into SQL statements, and send the resulting statement to the database:

class DB {
  public static bool Insert(Person p) {
    // Obtain connection (not shown)
    // Construct SQL
    string SQL = "INSERT INTO person VALUES (" +
      "'" + p.FirstName + "', " +
      "'" + p.LastName + "', " +
      p.Age + ")";
    // Send resulting SQL to database (not shown)
    // Return success or fail
    return true;
  }
}

The drawback to a procedural approach rears its ugly head fairly quickly: new types (Pet, Instructor, Student and so on) that also want to be inserted will require new methods of mostly similar code. Worse, if properties exposed to the public API don’t correspond one-to-one with columns or internal fields, things can get complicated quickly—developers writing the SQL routines will need to know which fields need persisting and which don’t, a pretty clear violation of encapsulation.

From a design perspective, the object-relational problem wants to capture the SQL-esque parts of persisting the data into commonality, so that managing database connections and transactions is handled in one place, but still allows for variability in the actual structure of the things being persisted (or retrieved).

Recall, from our earlier investigations, that procedural approaches capture algorithmic commonality, and that inheritance captures structural commonality while allowing for (positive) variability—but neither of these exactly do what’s needed. The inheritance approach—putting the commonality into a base class—will require that developers working in derived classes specify the SQL string and handle much of the in/out bookkeeping (essentially pushing commonality back down into derived classes). The procedural approach will need some kind of variability inside the procedure (extracting and building the SQL to execute) specified from outside the procedure, which turns out to be relatively difficult to achieve.

Enter Metaprogramming

One solution to the object-relational persistence problem frequently cited is that of automatic metaprogramming: using the database schema, create classes that know how to persist themselves to and from the database.

Unfortunately, this has all the traditional problems of code generation, particularly when classes want to change the object representation to be something easier to work with than what the physical database schema implies. For example, a VARCHAR(2000) column would be much easier to work with if it were a .NET Framework System.String and not a char[2000].

Other code-generation techniques started from the class definitions and created a database schema alongside the persistent class definitions … but that meant that somehow now the object hierarchy was duplicated into two different models, one solely for persistence and one for everything else. (Note that as soon as it becomes necessary to transform the object into XML, another hierarchy springs into being that you have to handle, and yet another one to handle JSON. Quickly this approach grows intractable.)

Fortunately, reflective metaprogramming offers potential relief. A part of the .NET Framework since 1.0, System.Reflection allows developers to examine the structure of objects at run time, which in this case permits the persistence-minded infrastructure the opportunity to examine the structure of the object being persisted and generate the SQL required from there. A basic introduction to System.Reflection is well-documented both within the MSDN documentation at msdn.microsoft.com/library/f7ykdhsy(v=VS.400) and in the MSDN Magazine articles “Use Reflection to Discover and Assess the Most Common Types in the .NET Framework” (msdn.microsoft.com/magazine/cc188926) and “CLR Inside Out: Reflections on Reflection” (msdn.microsoft.com/magazine/cc163408). I won’t discuss it any further here.

Reflection permits commonality of algorithm, while allowing for variability of structure manipulated by that algorithm, all while continuing to preserve the appearance of encapsulation. Because reflection (in an appropriately configured security context) has access to private members of objects, internal data can still be manipulated without forcing those data members to be made public. Positive variability—the ability to vary by adding things—is, as always, easy to work with, as the number of fields is largely irrelevant to most reflection-based code. Negative variability—the ability to vary by removing things—doesn’t seem to fit at all, however. After all, a class without fields doesn’t really need to be persisted, does it? And a reflection-based infrastructure looping through private fields won’t have much of a problem not looping at all, as nonsensical as that may seem.

However, negative variability here is slightly different than just not having fields. In certain scenarios, the Person class will have internal fields that don’t want to be persisted at all. Or, more strikingly, the Person class will have fields it wants persisted in a different data format than its CLR-hosted representation. Person.Birthdate wants to be stored as a String, perhaps, or even across three columns (day, month, year) rather than in a single column. In other words, negative variability in a reflective metaprogrammatic sense is not about the lack of fields, but about doing something different to certain instances of types that would otherwise be handled in a standard way (persisting a string to a VARCHAR column being the standard, for example, but for one or more particular fields, persisting a string to a BLOB column).

The .NET Framework makes use of custom attributes to convey this negative variability. Developers use attributes to tag elements within the class to convey the desire for that custom handling, such as @NotSerialized in the case of object serialization. It’s important to note, however, that the attribute itself does nothing—it’s merely a flag to the code looking for that attribute. Of itself, then, the attribute provides no negative variability, but merely makes it easier to denote when that negative variability should kick in.

Attributes can also be used to convey positive variability. One example is how the .NET Framework uses attributes to convey transactional handling, assuming that the lack of an attribute on a method indicates no transactional affinity whatsoever.

Mirror, Mirror, on the Wall

Without attributes, reflective metaprogramming establishes an entirely new kind of variability. Now names can be used to refer to elements within the program (rather than through compiler symbols)—and at a much later time (runtime) than the compiler traditionally permits. For example, early drops of the NUnit unit-testing framework, like its cousin JUnit in the Java space, used reflection to discover methods that began with “test” as part of the name, and assumed that they were test methods to execute as part of a test suite.

The name-based approach requires developers to take elements traditionally reserved for human eyes—the names of things—and require them to follow strict conventions, such as the “test” prefix for NUnit methods. The use of custom attributes relaxes that naming-based convention (at the expense of now requiring additional code constructs in the classes in question), essentially creating an opt-in mechanism that developers must accept in order to receive the benefits of the metaprogram.

Attributes also provide the ability to tag arbitrary data along with the attribute, providing a much more fine-grained parameterization to the metaprogrammatic behavior. This is something not typically possible with automatic metaprogramming, particularly not when the client wants different behavior for structurally similar constructs (such as the strings-to-BLOBs-instead-of-VARCHAR-columns example from earlier).

Owing to its runtime-bound nature, however, reflection frequently enforces a performance hit on code that uses it extensively. In addition, reflection doesn’t offer solutions to the problems cited in the automatic metaprogramming scenario from last month—the proliferation of classes, for example. Another metaprogrammatic solution is available, but that will have to wait for next month.

Happy coding!

Ted Neward is a principal with Neward & Associates, an independent firm specializing in enterprise .NET Framework and Java platform systems. He’s written more than 100 articles, is a C# MVP and INETA speaker, and has authored and coauthored a dozen books, including “Professional F# 2.0” (Wrox, 2010). He also consults and mentors regularly. Reach him at ted@tedneward.com with questions or consulting requests, and read his blog at blogs.tedneward.com.

Thanks to the following technical expert for reviewing this article: Anthony Green

The Working Programmer - Multiparadigmatic .NET, Part 6: Reflective Metaprogramming

Persistent Problems

Enter Metaprogramming

Mirror, Mirror, on the Wall

Additional resources