June 2009

Volume 24 Number 06

The Polyglot Programmer - Reaping The Benefits Of Cobra

By Ted Neward | June 2009

Contents

Beginnings
Trust But Verify
"It's a dessert topping; it's a floor wax. It's both!"
Summing Up

In the first installment of this column, I discussed the importance of understanding more than one programming language, and in particular, more than one kind of programming language; just knowing C# and/or Visual Basic isn't enough to stand out against the mob anymore. The more tools in your toolbox, the better the carpenter you become. (Up to a point, anyway—'tis the poor carpenter who owns every tool ever made and yet still doesn't know how to use any of them.)

In this installment, I'll examine a language that's not too far removed from the object-oriented languages you're familiar with, but has a few new features that will not only offer some new ideas, but potentially kick-start some new ways of thinking about your current tool set.

Allow me to introduce you to Cobra, a descendant of Python that, among other things, offers a combined dynamic and statically typed programming model, built-in unit test facilities, scripting capabilities, and some design-by-contract declarations.

Beginnings

As Lewis Carroll is famous for saying, let us "begin at the beginning." Cobra is a .NET language that can be obtained via your Web browser, at cobra-language.org, or via a Subversion client at the same domain. If building from source, use the install-from-workspace.bat script in the source directory to build the Cobra compiler and install the necessary assemblies into the global assembly cache (GAC). While building from source isn't necessary—prebuilt binaries are also available—it's often interesting to look at the source of Cobra's compiler, which itself is written in Cobra (a significant milestone in any language), as a way of seeing many, if not all, of the language features at work in a nontrivial program.

Once your compiler is built, the first step is to test out the Cobra language via everybody's favorite "test-the-compiler" (or is it "test-the-programmer") program, the ubiquitous Hello World:

class Program def main is shared print 'Hello, world.'

Cobra offers two different ways to interact with its source code: either the code can be executed directly via a script-like approach by executing "cobra hello.cobra" from the command-line, or the code can be compiled and then executed in the traditional manner of C#/Visual Basic/C++ languages, by compiling it first ("cobra –compile hello.cobra") and executing it as you would any .NET executable. Note that the "cobra" executable is the same in both cases, the one built in the Cobra\Source directory. (Observant readers that examine the contents of the directory after "scripting" the .cobra file will notice that Cobra's "scripting" mode is really just a two-pass compile-then-execute sequence, leaving the compiled "hello.exe" behind after it's finished executing.)

Looking at the Cobra source, it becomes apparent that while there is some distinct similarities to the C# and Visual Basic languages, there are also some distinct differences. In fact, those readers familiar with Python (or its CLR equivalent, IronPython) will quickly notice that Cobra is primarily derived from the Python syntax—Cobra's inventor, Chuck Esterbrook, found the Python syntax appealing and used that as a base from which to derive his language.

For those who've never used Python before, however, one difference immediately stands out: instead of using keywords (like "begin" and "end") or tokens (like "{" and "}") to set blocks of code apart from the surrounding code, Cobra—like Python—uses significant white space, or, as Cobra's inventor prefers, indented blocks, meaning that blocks of code are set apart by lexical indentation. In other words, the end of a decision-making block (an "If" block) is signaled not by a closing brace, but by the fact that the next line is one level of indentation removed from the previous line. Thus, in Figure 1, Cobra knows that the final "print" line is executed regardless of the results of the "if" test, because that line is indented equally to the "if" statement that made the decision.

Figure 1 Indentation Structure in Cobra Code

class Program def main is shared if 1==1 print "Oh, goody, math works!" print "I was beginning to get worried there." print "Math test complete"

Because of the problems that emerge if you mix both tabs and spaces in a single file to control the indentation, Cobra mandates either/or, but not both, generating a compiler error where it sees a mix of the two.

Furthermore, Cobra has codified ritualistic tendencies in the name of code clarity:

  • Types such as classes, structs, interfaces, and enums, must be capitalized and cannot start with an underscore, similar to the patterns established by Microsoft in the Base Class Library: String, Control, ApPDomain, and so on.
  • Arguments and local variables must begin with a lowercase letter and cannot begin with an underscore. Examples include x, y, count, and index.
  • In expressions, access to object or class members is always done explicitly with a period (.foo) or implicitly if the member starts with an underscore ("_foo").
  • Although not enforced, object variables are typically prefixed by an underscore, which also implies protected visibility. Examples include _serialNum, _color, and _parent.

Thus, for any given fragment of Cobra code, a Cobra programmer can easily distinguish the elements involved in the fragment or snippet. For example, in the following Cobra statement

e = Element(alphabet[_random.next(alphabet.length)])

we can tell that

  • e and alphabet are local variables or arguments
  • _random is an object member/variable, of (probably) protected visibility
  • Element is a type

Cobra thus follows the basic premise of Python: that for any given scenario, there's only one way to do it, including code naming and formatting conventions.

Trust But Verify

Trust but verify. Those words, famously uttered by former U.S. President Ronald Reagan (in the context of arms reduction negotiations), hold just as strongly for programming as they did for nuclear arms reduction. Although it's easy to believe—and trust—that fellow job-site programmers would always know better than to pass a null reference into a method that clearly documents the fact that doing so will generate a NullReferenceException, it's generally been proven over and over again (particularly during demos before the customer or the Big Boss) that the safer course is to make those assumptions explicit. In fact, had the former President been alive today and taken up programming, it's entirely possible that he would amend his words to read, "Trust, but document, insist, and verify."

In the programming world, that means two basic things: to insist is to program defensively, often using assert statements or methods to ensure (at the risk of a runtime exception) that values passed in match a certain criteria; to verify, is to write unit tests against the method to ensure that the method's implementation handles the pathological programmer who insists on violating those restrictions.

Consider the (relatively simple) task of creating a class that represents a person within the system. Persons generally have a first name, a last name, and an age, and can be married to other Persons. In general, when writing these kinds of classes, you need to establish some invariants—basic principles that will hold for any use of the class. So, for example, a Person class may decide that a last name cannot be null, something that is generally enforced by the developer writing Person and usually done in the property-set construct of the class, as you see in the C# code in Figure 2.

Figure 2 Invariants Enforce Constraints

public class Person { public Person(string fn, string ln, int a) { this.firstName = fn; this.lastName = ln; this.age = a; } public string FirstName { get { return firstName; } set { firstName = value; } } public string LastName { get { return lastName; } set { if ((value != null) || value.Length > 0) lastName = value; } } public int Age { get { return age; } set { age = value; } } public override string ToString() { string ret = String.Format("Person: } }

The problem with this is that the invariant is enforced only when the property-setter is used; if another developer comes around later and puts a new method on Person that directly references the backing store (the field encapsulated by the property), the invariant is thus silently and subtly broken, and no one will notice it until it is too late. (This is the heart of the debate around whether classes should use properties internally or directly manipulate the fields.) This question, of course, applies equally to both properties and methods. Although it's not often that a property touches more than one backing field, it's not prohibited by the language, and it does happen periodically. Of even greater concern, each and every invariant (first name cannot be null or empty, last name cannot be null or empty, age cannot be negative) will need to be enforced, in each and every method, property, or constructor that could potentially modify the internal state.

Cobra offers a basic design-by-contract system, in that a class or method can declare certain requirements about the class, and the Cobra compiler will silently add verification statements to each and every member (or property or method) that could potentially modify the data in question. In this way, it is guaranteed that last name is never null, regardless of how the class is used and without having to explicitly code it in every method that could possibly change the contents of the instance (see Figure 3).

Figure 3 Verification Statements

class Person invariant .firstName <> "" .firstName.length > 0 .lastName <> "" .lastName.length > 0 .age >= 0 var _firstName as String var _lastName as String var _age as number pro firstName from var pro lastName from var pro age from var cue init(first as String, last as String, age as number) ensure first.length > 0 last.length > 0 age > -1 body _firstName = first _lastName = last _age = age def toString as String is override return 'Person: [.firstName] [.lastName], [.age.toString] years'

It's relatively easy to verify exactly how far and how often the Cobra language inserts these validation checks, by firing up the IL disassembler (ILDasm.exe) and examining the resulting Person class. Doing so reveals the beauty of the Cobra approach—each and every "modifying" method is appended with the code to verify the invariants, making them class-wide. (There is a compiler option to change how the invariants are generated—not at all, or only in methods, or, the default, appended to all of those operations.)

In and of itself, the design-by-contract feature of Cobra is nice, but it's still not sufficient to guarantee error-free programs; in addition, it's almost always necessary to write a series of unit tests against the code to ensure that it behaves as desired where both good data and bad is passed. Again, given that this is so critical to the software development process, Cobra chooses to make this a first-class construct inside the language—a "test" construct of the language—such that Cobra can execute the tests as part of the compilation process. So, for example, a simple suite of unit tests might look like Figure 4 for the Person class.

Figure 4 Unit Tests for Person

class Person invariant .firstName <> "" .firstName.length > 0 .lastName <> "" .lastName.length > 0 .age >= 0 test p = Person('Neal', 'Ford', 29) assert p.firstName == 'Neal' assert p.lastName == 'Ford' assert p.age == 29 var _firstName as String var _lastName as String var _age as number # ... everything else as-is ...

When run with the "-test" option at the command-line, Cobra will execute the unit tests contained in the class, and assuming they all work as expected, Cobra will report the tests pass, as well as compile the code into an executable. (Libraries are compiled using the "-t:lib" command-line switch.)

This is a different way of building unit tests, but it holds several advantages over the traditional NUnit-based approach to unit testing in that the tests are kept close to the code rather than in a separate file where it can get too far away from developers who are maintaining or updating the code. Anything that makes it easier to write (and maintain) the unit tests is generally considered a good thing, and having the unit tests right inside the class to be tested is a trick long ago learned in the Java camp.

Unfortunately, with this approach to tests comes code bloat; if the tests are sufficiently comprehensive (and they should be), including the contractual verification and unit tests as part of the deployed code can double, triple, or even quadruple the size of the code, which can certainly put a crimp in your plans for easy deployment of the finished product. Fortunately, the Cobra language provides a solution to this as well, via the "-turbo" option, which not only turns out every optimization, but also strips out the design-by-contract and unit test code, and is generally intended as a just-before-deployment optimization step.

"It's a dessert topping; it's a floor wax. It's both!"

Ages ago, a Saturday Night Live skit talked about the versatility of a product that was both a dessert topping and a floor wax. "It's both!" was the tag line, and at the time, it was pretty funny. Hey, it was the 70s.

Recently, however, a new flavor of argument has emerged within the programming language community, that of static versus dynamic typing—in other words, we're back to the same arguments that used to be leveled against (classic) Visual Basic and its runtime (IDispatch-based) binding and VARIANTS by the community of developers using C++ and its compile-time template facilities. This time, however, it's the strongly/statically typed community on the defensive, arguing the merits of safety in strongly/statically typed languages, against the productivity benefits cited by users of Ruby and Python (or, if you prefer, IronRuby and IronPython). As with most debates of this nature, there's lots of rhetoric, lots of unsubstantiated claims, and lots of cited statistics (most of which are made up on the spot).

In truth, the argument is a bit spurious and often masks a deeper difference, that of early or late binding of method calls. In "traditional" compiled languages (like C++/CLI or C#), the compiler decides at compile time whether a method call is acceptable based on the presence of a matching method. If one is found on the target class, it emits the necessary instructions (in CIL, this is a "callvirt" instruction with the metadata token of the method in question) into the generated assembly. In contrast, a late-bound call isn't actually resolved until run time, usually via a metadata-based API like Reflection, at which point the run time looks up the method, and if present, invokes it.

The differences here aren't too difficult to spot—in the first case, the method has to be visible at compile time, forcing the programmer, in some cases, to engage in some type navigation to keep the compiler happy. (This is the ubiquitous "if obj is Person" followed by a downcast and method call idiom.) This particular sequence of language statements is often frustrating, particularly when the programmer "knows," by virtue of a wider awareness of the context of the situation, that the object in question is exactly the type it needs to be to make the call succeed. Yet the compiler can't see that far, and forces the developer through a series of steps simply to appease the compilation process.

In the second case, however, the programmer will often discover that what she "knows" doesn't exactly jibe with reality, usually because a run time exception is thrown at the most embarrassing moment possible—during the big demo, the product launch, or at run time on the day of the IPO. Maybe another developer accidentally broke that assumed invariant, maybe it's a code path that was never quite covered during unit tests, but regardless of how it happens, it still leaves the program terminated, and the programmer mortified.

Thus the argument persists: is it better to take the productivity hit to get compiler-assured promises of code correctness, or is it better to trust the programmers' unit test suite and tell the compiler to go hang?

Cobra neatly dodges this argument completely, by being both static and dynamic. In other words, Cobra takes the position that when it can do so, it will bind method calls early, in the same way that C# chooses to, but when it can't, for whatever reason, it will choose instead to use late-bound semantics and resolve the method call at run time. And, while C# 4.0 promises a similar kind of capability (as, by the way, has Visual Basic since it became a .NET language, via the Option Explicit and Option Strict flags), Cobra requires no additional syntactic help from the programmer to decide when to be early-bound and when to be late-bound; it simply makes the decision as it compiles, neatly avoiding the need for the developer to make that call ahead of time, as in Figure 5.

Figure 5 Dynamic Binding

class Person get name as String return 'Blaise' class Car get name as String return 'Saleen S7' class Program shared def main assert .add(2, 3) == 5 assert .add('Hi ', 'there.') == 'Hi there.' .printName(Person()) .printName(Car()) def add(a, b) as dynamic return a + b def printName(x) print x.name # dynamic binding

As you can see, Cobra supports both the "dynamic" modifier, to indicate that a particular type or method should treat its arguments and types as dynamic, as well as the "dynamic inference" that is used in the "printName" method, where Cobra simply recognizes that name can't be bound statically, and instead seeks to resolve it at run time.

Summing Up

Cobra's unique feature set marks it as an interesting "sum of choices" language. The silent switch between early- and late-binding makes it an idea language for working with .NET components that make heavy use of late-bound APIs, such as the Office Automation model, while still preserving the safety and performance benefits of early-binding where possible. This alone makes Cobra worth considering, particularly for Office (and other COM-based) Automation programming.

When combined with Cobra's ability to act as both scripting and compiled language tool, however, Cobra's advantages really begin to shine.

Good luck with your Polyglot experimentation!

Send your questions and comments for Ted to polyglot@microsoft.com.

Ted Neward is a Principal Consultant with ThoughtWorks, an international consultancy specializing in reliable, agile enterprise systems. He has written numerous books, is a Microsoft MVP Architect, INETA speaker, and PluralSight instructor. Reach Ted at ted@tedneward.com, or read his blog at blogs.tedneward.com.