About the fundamental notion of software languages

Article
03/27/2008

The call for papers for the 1st International Conference on Software Language Engineering (SLE) is public now. I am happy that we did all the work to start this new conference. I am 100% confident in the SLE conference; it’s backed up by a strong and self-confident community that has something to say about generic language technology, programming languages (incl. parsing, semantics, type checking), domain-specific languages, program transformation, software re- and reverse engineering, and model-driven development; it’s also backed up by the ample relevance of this area for the progress of IT, and the insight that the fundamental notion of languages cannot sufficiently be discussed within the scope of “technical spaces” alone (read as grammars, XML, UML, graph transformation, etc.). Have a look at the call. SLE 2008 will be co-located with MODELS 2008, September 29-30 October, in Toulouse, France.

Disclaimer: A subjective perspective on SLE follows. Opinions expressed below are neither necessarily shared by my wife, nor by other SLE organizers and combatants, nor by my current/previous/future employers. Also, don’t take me too serious. Remember this is a blog, I am a Cobol+Haskell nerd, and I don’t use smilies, as a matter of style. As one of my SLE friends (namely, JMF) tends to say: “May the fun be with you.” Indeed, SLE 2008 isn’t just an ambitious conference. We also want to have fun studying languages, being language nerds, comparing artificial and natural languages, engineering denotational semantics, hooking into popular IDEs, hacking parsers and type systems, and bashing XSD. Did I mention, BTW, that we meet up in Toulouse? Toulouse is not Bordeaux (“city of wine”), it’s not Lyon (“city of gastronomy”), but it means quality French wine and food; if you are stationed elsewhere in this world, then get some rest for a few days, and submit an unrejectable paper, or attend anyhow. In fact, I am just reading in a message from JMF: “Toulouse, a `haut lieu de la gastronomie francaise’". Watch your weight.

The IT Tower of Babel

Just as much as mankind has suffered from the construction of the (Biblical) Tower of Babel and the subsequent confusion of languages (in fact, tongues), mankind (say, IT) is now suffering from the IT Tower of Babel – say the continuing emergence of always new (artificial) languages without much reliable ability to mediate between or to integrate or to evolve or to replace these languages; without even much shared understanding for the need for all these languages; overall: without much ability to deal with the increasing complexity of the pool of relevant languages in any given IT context.

Is the Tower of Babel a good metaphor? Admittedly, the (Biblical) Tower of Babel was simply an overambitious construction project that justly annoyed the chief architect (not of the building but of the universe) up to the point to punish the constructors with the confusion of languages; other punishments could have been chosen – such as blocking access to further construction material. In contrast, I use the term “IT Tower of Babel” to refer to the plethora of artificial languages as such. This plethora was not caused by Microsoft’s CSA or any other major chief architect, but it’s pretty much a self-punishment of the IT community as a whole. We somehow ended up with all these artificial languages. Technology providers (say for compilers and platforms) certainly had a part in this: variation of language (read as programming language, any sort of tool-dependent file format) is often used for differentiation. Generally, applied computer science (let’s say “IT computer science”) is not quite an engineering science; it is so much affected by individuals, industry, fashion, and fiction. Finally, the emergence of artificial languages is subject to entropy; it seems we are just getting more – no matter what.

Life was easy when there was only the Adamic language; IT life was easy when languages were in the hands of computer scientists (as opposed to language amateurs, not caring about semantics and program algebra); when there were just a few important programming languages: Cobol, C, Fortran, and not much more; when systems didn’t have to interoperate by means of C/S, SOA, WebServices; when there were just a few APIs per platform; when system owners were mostly happy with staying with their (evolving) platform for 1-3 decades. In fact, there were quite a few platforms (think of the mainframe age and the various Cobols; ask your grandpa), but that wasn’t too much of a fundamental problem; it merely provided job security for a small crowd of “interpreters” – specialists who could connect systems on different platforms in those relatively few cases when it was inevitable. (Read “few” in comparison to today’s standards.)

The confusion of natural languages happened instantly, and it was fatal to the further construction of the (Biblical) Tower of Babel; suddenly we had 70-72 dialects – a number that is amazingly small for IT standards. In contrast, the confusion of artificial languages went through some stages, and it’s ongoing.

Stage I: “Many programming languages”

The problem with the confusion of natural languages basically was that people lacked translational power for the 70-72 dialects. Lacking this power, the offending building had to be abandoned. There are probably a thousand programming languages in use, certainly if we count dialects and versions, even if we narrow down the number to business-critical languages – say languages used for business-critical software systems – as opposed to say languages for making rhymes about beer. Such a multitude of languages is a somewhat unsolved and serious problem in IT by itself (and not made easier by the continuing increase of languages); think of mass maintenance or language conversion. The problem can be economically fatal for a system owner; lack of transformational and conversional power makes it impossible to keep up with inevitable evolution pressure.

Now, you might say: “Lack of transformational and conversional power as in mass maintenance? This sounds like a Cobol problem to me. Cobol is for wimps; I am using Java or .NET, look at my beautiful refactoring support, lock at how I can patch this/that compiler to give me (low-level) access to the parse-tree of a program.”

Then try this: develop a converter from Java 1.1 to C# 3.0 applications; please send me an email, every now and then, how it goes. If you are among the top 10 of computer scientists worldwide, or an extraterrestrial hacker, you might get this done before Isabelle enters school, and you will sell it for more money than is needed to stop global warming locally in your county. That’s the point: shouldn’t we all work on making these things easy and repeatable and doable by folks with just one PhD? Let’s have 10 years of the new SLE conference, and see whether we can get closer to this goal!

Want a different challenge? Then try this one: go and patch Squeak or any other major Smalltalk dialect so that it gets a decent amount of static checking, in fact, up to the degree that you would get in an OO functional language with type inference, modulo the necessary compromises to account for Smalltalk’s breathtaking expressiveness, most likely on the grounds of soft typing. Providing static typing for Smalltalk has been attempted a number of times (see Strongtalk, for example), but it hasn’t been successful so far. Smalltalkers would absolutely appreciate additional checks, as long as they don’t get into the way, as long as they do not remove expressiveness. I think that this is an extremely laudable attitude, and we just must get better software-language engineers to solve this problem, ultimately. In more general terms, Erik Meijer uses the slogan “Static Typing Where Possible, Dynamic Typing When Needed”, and, of course, the assumption is that we extend our horizon as to what we call a type, as to the perhaps old-fashioned dichotomy of static types, dynamically checked assertions, and offline-proven program properties.

Stage II: “Complex platforms”

Even if you are a wizard of the C#, Java or VB syntax and semantics, you cannot write much else than “hello world” programs and the factorial function. In fact, “hello world” already requires some “library” knowledge. More generally, nobody can get real work done without profound knowledge of a platform. I want to make the point here that a platform is actually an overweight cocktail of languages: API signatures, API usage protocols, make/build/configuration descriptions, ontologies for semantic annotation (“custom attributes”), …, you name it. Further, the use of a typical platform is tied to the use of different data models in the average application: POJO/POCO, not so plain objects, a bit of relational model, and a bit of XML. (Defining schemes according to different data models, mapping between specific schemes, and mapping generically between different data models – these activities all count as software language engineering, in case, you didn’t notice.) Moreover, as a “platform user”, you use different type systems: Java 1.1, Java 1.5, C# ?.?, VB.NET ?.?, SQL, XSD, … – type systems are languages, too. Finally, you also use systems of designs patterns – again, these are languages.

Compare this to good old Cobol times, when you had an all-inclusive language (“hello world” works with the DISPLAY statement; data access works with the CRUD statements), and some pretty lean (and optional) tools for forms, reports, and structured programming. The data model of hierarchical indexed and sequential files was deeply embedded into Cobol. Can’t we just fix Cobol? It already has OO. We know how to do aspects in Cobol; even though the extension got accidentally obsoleted some decades ago. Let’s re-enable aspects, add a few more things, and then convert back to Cobol, worldwide. I’ll try to get the Haskell community aboard.

If we do not convert back to Cobol, we need to deeply understand platforms as cocktails of languages. You might say, “I don’t care much that these are languages; I just have to handle this complexity, call it language, call it software development reality. Again, I don’t care.” That’s the point: nobody cares, and the various languages are disintegrated; some of them are obfuscated; many of them are compromised. Also, they deadlock each other’s evolution and clean-up. Indeed, backwards compatibility deadlocks everything. I have a dream: I imagine an (IT) world where the notion of backwards compatibility has stopped to frighten us; evolution is just so effective that we can always factor in ways that we see a system and a platform that represents the optimum in design conceivable by the time. (Perhaps, the Smalltalk folks can do it? Come and talk at SLE 2008…!)

Again, we should be able to effectively refactor platforms. First example: concepts that were previously in the build rules get suddenly deeply embedded into the language, or disappear from the programming surface because new kinds of compilers automate the concept. Second example: How many XML APIs are there for Java? My bet is 42. Do we need all of them? Shouldn’t we be able to retire half of them or more? Should a Java program in 5 years from now even care about the dichotomy “in-memory” vs. “push/pull”? Third example: if suddenly XSD is found to be too complicated, can we just replace it all over our apps by the use of say RELAX NG? (Some XML bashers may even want to stretch this scenario, and transform away all XML in their apps. Now, let’s not overdo it, Ok?) Fourth example: whenever a major language extension makes to our favorite programming language, we would like to retire design patterns that mimicked the same concepts previously, and we would like to do mining so that lower-level program structures are semi-automatically lifted to the full, extended language. We have seen research on instances of this idea such as in aspect mining or refactoring for generics. How can we make this idea more repeatable? How can we define the higher-level language concepts in such a way in the first place that their relationships to design patterns or lower-level program structures follow from the language definition, and do not require heartbreaking proofs, after the fact? Overall, shouldn’t we limit the entropy in IT? Otherwise, suddenly 50% percent of all human beings may end up working as programmers. Mark my words!

Stage III: “Complex interfaces”

Even if you are a wizard of the .NET or Java platform, and have a Master’s in say financial services or “foundations” of eCommerce, you will still not be able to do a “MyAmazon” or “MyEBay” because this sort of application must interface with a good deal of other systems in quite complicated ways. First, the sheer size and the sheer number of interfaces that must be contented by the advanced web application are challenging. (For instance, have a look at FpML – a sized (markup) language for financial products.) Second, the interfaces (or services) come with fundamentally different protocols and kinds of contracts (e.g., synchronous, asynchronous, reliable, occasionally connected, self-persisting, encrypted, session-based, etc., sigh!). Third, evolution is the straw to break the camel’s back. Software evolution, in the presence of complex platforms and complex interfaces is currently infeasible in theory, even though it occasionally works in practice – often by not admitting evolution. No matter whether it works or not, software evolution triggers interesting, additional artificial languages, e.g., transformation languages, and model/metamodels for reverse engineering.

I guess WebServices with WSDL and friends was a first attempt to solve this problem. Does it work? Not for me! Whenever I see a .wsdl file, I am overwhelmed; I look somewhere else. I guess SOA is the current ~~hype~~ way of addressing the problem. In both cases, the foremost emphasis is on service description (based on some sort of signatures and protocols, again languages). In both cases, some efforts address the discovery of services. In neither case, I am aware of a comprehensive approach to service evolution with all the encompassing problems of co-evolution and versioning. I really hope that the WebService/SOA community feels somewhat represented by and interested in SLE. I am sure that SLE will attract experts in programming languages, program transformation, language design, and programming environments; this sort of expertise should go well with the challenges in WebService/SOA. Let’s benefit from each other’s interests and strengths! Let’s meet at SLE 2008 in Toulouse.

Stage IV: “Driving models”

Model-driven engineering/development is partially meant as a solution to some of the problems mentioned in the descriptions of earlier stages. We should ignore the fact that MDE/MDD was initially touted as “vacuous fluff” on the OhMyGosh site, aka MDA. In fact, the idea of generating implementational artifacts from higher-level specifications or “models” is laudable and pretty established in CS, so MDA didn’t quite start the salvation of mankind; here are some distant references, [1], [2], [3], [4], [5], but I am not always sure what “model-driven” means – so these related work pointers are given without any underlying principle of what’s in and out. Nevertheless, I mention them here for the benefit of some of the disoriented PhD students that I occasionally encounter at conferences.

Perhaps my stupid problem with model-driven foo/bar is the following: Yes, sure, if … [fun_level++] … I start from a language setup with only brain-dead abstraction mechanisms, don’t care about the fundamental notions of refinement, translation, composition, partial evaluation, drive my efforts by the ideas of boxes, arrows, and stereotypes, add an amazingly complex constraint mechanism (like I would add the wings of an air shuttle to say a Trabant), invent a dozen notations for state machines of different kinds, throw hundreds of people at standardizing this Godzilla, … [keep on rambling for some time] …, then, yes, I can ~~fly~~ deliver very expressionistic pieces of art (read as UML diagrams) meant as solutions for problems of even the smallest size, differing only in the degree of required masochistic orientation. [fun_level--] (I apologize for this imperative idiom, but it’s just too convenient.)

Anyway, if we just think of it as program refinement, translation, composition, partial evaluation – then it all makes sense. So we just call it model-driven engineering, and try not to get stuck in bad memories of early MDA. No matter what, model-driven engineering is a software-language engineering challenge per excellence. We get additional languages (e.g., the language that is refined to be able to express distribution behavior after refactoring or refinement; think of POPL-powered explanation of the GWT); we get additional operations on languages (other than the obvious modes of compilation and interpretation). This creates (interesting) consistency challenges of a magnitude. Languages that arise due to well-defined refinement or translation are hard to argue away; so here I take the position that we better cope with them, make it easy to endorse them, and fully support their life cycle.

Stage V: “Languages as underwear”

Toddlers develop their own natural language, and parents are motivated to catch up and understand them, sometimes imitate them. When friends come for dinner, parents even volunteer as interpreters, and everyone is happy and enjoys the individual language. It also makes sense because a full-blown dictionary and grammar is way too complex and inappropriate for the toddler’s ability and mind and focus. Also, parents do parenting quite a bit for the entertainment value of toddlers, and those individual languages are part of the entertainment value, no? Also, the time frame for using these languages isn’t completely insane; the toddler is going to change diapers way more often than inventing a new baby language.

In the emerging age of domain-specific languages, or should I say, in the current, more intensive wave of the long-standing DSL movement, we are suddenly able to change our artificial languages like underwear; introduce a new one, every day; use one per component of the application! To me, it is not strikingly obvious that this works, unless we impose appropriate constraints on this ability. (Come to SLE 2008, and tell us, how it works!) Taken to an extreme, “languages as underwear” would correspond to an adult who invents a new (natural) language every day or so, while assuming his colleagues, friends, and partner to catch up and learn this language pretty quickly. Perhaps, the adult is actually teaming up with a few buddies to justify the creation of the new language. Still, imagine everyone being so creative continuously. Let me emphasize that the creation of a special-purpose dictionary, or, let’s say, an ontology (to impose more structure on a dictionary) is Ok. I wonder … what’s fundamentally wrong with say English or German given these languages’ expressiveness? Getting back to artificial languages, what’s wrong with say Haskell 98+, provided we are getting depending typing, clever macros, and a few more gadgets. Then we will be “DSL complete” (as opposed to Turing complete).

Here is a pretty non-controversial definition of the term “domain-specific language”: it’s a language with domain-specific support for notation, abstraction mechanisms, checks, and behaviors (e.g., efficient behaviors based on domain-specific transformations). This sounds good. Sure, we do want the capability to define domain-specific languages. However, the dominating discussion of DSLs lacks some related, very important aspects. Suppose we are close to the point that it is easy to define a DSL, don’t we also need the ability to evolve them, integrate them, retire them, objectively measure the impact of introducing them, or analyze their usage and provide feedback for their evolution? Here I note that not even the mere definition of DSLs is a done deal, unless you are fan of attribute grammars, dependent typing, and a few more weapons of math-based distraction. It is relatively established to provide domain-specific notations, e.g., visual frontends on top of an object model; it is still relatively difficult to provide domain-specific checks, behaviors, and to support DSL evolution, again, unless you are a cutting-edge language nerd – to be clear and serious, I don’t consider myself nearly to be such an expert.

Conclusion

When the confusion of natural languages happened, it was at least clear that everyone was still talking in some language, even though people would no longer understand each other – too bad. Again, there wasn’t much dispute as to what a language is. In CS, we have taken the confusion of languages to a new level. We are so overwhelmed by language-like entities so that we do not simply lack the ability to understand the various languages at hand; worse than that – we do not even realize when “someone is talking”; we do not even “see” the language when it’s there. Also, we are caught up by details of rendering – comparable to someone who has no ability to deal with even the lowest degree of a dialect. It may also be that we are too openly endorsing slang. By this I mean that the SLE-related areas of CS and IT could perhaps make better use of (and constructively challenge) the fundamental insights accumulated in parsing, formal semantics, programming languages, program transformation, testing, verification, literate programming, and general automated software engineering.

We will inevitably see more languages (programming languages, APIs, DSLs, metalevel/transformation languages, ontologies, …). Understanding the software life cycle of all these languages is a central challenge for the CS discipline. We must improve our abilities to retire languages when they are no longer needed, objectively determine when a domain-specific language is worth designing, provide higher abstraction levels in programming, validate and verify language definitions or implementations, modularize them, think beyond the single-language frame, and endorse language integration and composition, … [ please check out the call for papers for SLE 2008 ] ….

Ralf Lämmel

Wannabe Software Language Engineer