Lessons from the Component Wars: An XML Manifesto

 

Don Box
DevelopMentor

September 1999

Editor's note: As of March 2004, this article has been reviewed, and the content remains valid.

Summary: This article discusses how component technology evolved and how eXtensible Markup Language (XML) will become widely used as a way for software components to interoperate, in essence acting as a gateway between autonomous, heterogeneous systems. (10 printed pages)

Contents

Lesson 1: My Glue Is Better Than Your Glue Lesson 2: I Can't Believe You Program In That Language Why XML? Component Integration Technologies Summary Additional Resources

The primary goal of component software is to enable collaboration and cooperation amongst software development organizations. However, over the past decade, component technology has been the cause of many arguments, disagreements, and debates. This component-induced friction can be traced to two primary factors:

  • Different organizations and corporations want to be the de facto provider of component infrastructure.
  • Component technology provides more opportunities for different programming cultures to interact.

There are many lessons to be learned from examining these two factors closely. In this article, we will examine how component technology has evolved to XML. Before looking at XML, we need to examine what came before it.

Lesson 1: My Glue Is Better Than Your Glue

At the end of the day, the primary function of a component technology is to act as glue between multiple pieces of software. This is true of the Component Object Model (COM), as well as Java (the technology). This is also true of Common Object Request Broker Architecture (CORBA). All three technologies provide infrastructure for integrating software components written by autonomous organizations. From the 10,000-foot view, these three technologies are more or less the same. Up close, however, each technology uses radically different techniques and programming styles to achieve its goals.

Dissecting Component Technology

Component technology is about interoperation. It is interesting to look at the degree to which each component technology tries to enable interoperation. One can view these degrees of interoperation according to the following layering model:

Figure 1. Four degrees of component interoperation

In-Memory Interoperation

Mixing multiple components in memory is easily the most intimate degree of interoperation possible. By standardizing an in-memory representation that all components must adhere to, a component technology can offer excellent performance. Additionally, having a standardized in-memory representation allows the supporting run time to offer a wider array of component management services at a substantially lower performance cost than would otherwise be possible.

COM standardizes the in-memory representation of object references based on simple Cstyle virtual function tables. This makes in-process COM very easy to support on any platform. (Netscape did exactly this for their cross-platform Web browser.) Java standardizes the representation of component code and each virtual machine defines its own in-memory representation for objects. The advantage of this approach is that each virtual-machine implementor is free to innovate while still building on a common component format. The disadvantage to this approach is that components must run in the same virtual machine to interoperate, which in the presence of versioning is not always possible. The CORBA specification punts on in-memory representation, as the original goal of CORBA was to provide an object-based remote procedure call (RPC) system.

Source Code Interoperation

Component technologies often require the developer to explicitly program against an application programming interface (API) of some sort. By standardizing an API for accessing component services, a technology can enable source-level interoperation, allowing component source code to be recompiled against another vendor's implementation of the technology (modulo OS-specific system calling sequences like fork() or CreateFile()).

COM exposes its services via the COM library and the Co APIs (for example, CoCreateInstance or CoInitializeEx). A significant subset of the Co APIs is consistent across platforms (for example, Windows NT, Windows 95, Solaris, and Linux) and allows COM source code to be recompiled on multiple platforms. The CORBA specification defines a set of standard interfaces that an ORB vendor must support in order to be considered CORBA compliant. This set of interfaces is considered a bare minimum, and most ORB vendors augment the standard CORBA API with proprietary extensions. Most Java-based component services are simply integrated into the language and don't necessarily have an explicit API. This makes the component aspects of Java fairly transparent. However, Java critics often cite the fact that one must port all of your software to the Java programming language, in essence tying your entire source-code base to Java technology.

Type Information Interoperation

Components need to be described to programmers who will consume the component and to the underlying component system in order to ensure proper integration. All three component technologies provide a standardized way of representing type information for consumption by developers and the supporting infrastructure.

CORBA provides a text-based interface definition language (IDL) that allows objects to be described in a programming language-neutral manner. By defining all publicly accessible data types in IDL, it is possible to access a CORBA object from any programming language that has ORB support. CORBA IDL is required to integrate with most CORBA products. COM also has a text-based IDL that is more or less equivalent to CORBA IDL (COM IDL supports more data types, CORBA IDL is easier to author and parse). As Microsoft is quick to point out, COM IDL is technically optional; however, it is used extensively on most COM-based development projects. Both COM and CORBA IDL tend to be good for authoring but not as good for interoperation and interchange. Due to their text-based nature, IDL-aware tools and infrastructure must parse a fairly rich language that has some tricky grammar rules, as well as dependencies on the C preprocessor.

To address this problem, COM also provides a binary form of type information called type libraries. Type libraries contain most (but not all) of the information in a COM IDL file in a representation that is easily parsed using the system-provided type library parser (see LoadTypeLib for more information). Because all Java components adhere to a standard self-describing class file format, no additional type information support is needed. With Java 1.1, it is possible to traverse a component's public interface using intrinsic reflection services exposed by all Java-virtual machines.

Wire Interoperation

The holy grail of component technology is distributed computing. Components are often viewed as the enabling technology that will make building distributed applications easy. In an attempt to satisfy this goal, component technologies often define new network protocols to allow components to communicate across host machines.

Due to Windows NT's heavy orientation towards the Open Software Foundation's Distributed Computing Environment (DCE) RPC mechanism, COM leverages the DCE RPC protocol for framing and transport and uses the Network Data Representation (NDR) for parameter encoding. The Distributed COM (DCOM) protocol simply defines a handful of DCE RPC interfaces that are used for object activation, type coercion, and life cycle management. In essence, DCOM is just another DCE RPC application.

CORBA supports a variety of protocols, with Internet Inter-ORB Protocol (IIOP) being the most common protocol for interoperation. IIOP layers simple framing and conversation management over TCP and uses the common data representation (CDR) for parameter encoding. Java supports both IIOP/CDR as well as the native remote method invocation (RMI) protocol JRMP. JRMP is based loosely on the Java-serialization format and can work over ordinary TCP or HTTP.

Component Technology and World Domination

It should be clear from the previous discussion that there are many valid approaches to integrating software components. Each of the three technologies discussed have loyal and dedicated followers that have committed considerable resources to their technology of choice. For this reason alone, it is unlikely that any of these three technologies will disappear from the development landscape anytime soon.

However, it is also unlikely that any of these three technologies will dominate the Internet. The network protocols used by these three technologies tend to require a non-trivial amount of run-time support to function properly. Ironically, while Microsoft and the Object Management Group (OMG) were arguing over whether the Internet would be run on DCOM or CORBA, the Hypertext Transfer Protocol (HTTP) took over as the dominant Internet protocol. Like many other successful Internet protocols, HTTP is simple, text-based, and requires very little run-time support to work properly. Additionally, many corporate firewalls block DCOM and CORBA traffic, while happily allowing HTTP packets into their (mostly) guarded networks. Finally, when you consider the amount of engineering effort dedicated to making HTTP servers (for example, Internet Information Server (IIS) and Apache) scalable, reliable and easy to administer, it becomes harder to justify not exposing your software components using HTTP technology.

Lesson 2: I Can't Believe You Program In That Language

Programmers rarely self-identify based on component technology or operating system. Similarly, programmers rarely self-identify based on problem domain. Rather, programmers have classically self-identified based on programming language. While it is not unheard of for someone to state "I am a UNIX programmer," it is far more common to hear someone say "I am a C programmer" or "I am a VB programmer." At the end of the day, programmers get extremely intimate with their programming language of choice, often at the expense of building expertise in their problem domain or in supporting technologies.

As someone who has spent the last six years as "a COM programmer," I've been exposed to a variety of programmers, each of whom has adopted COM as their component model. One would think that standardizing on a language-neutral component model like COM would make a programmer more open-minded about other programming environments. In my experience, the exact opposite usually occurs.

Because COM opens up more possibilities for collaboration or cooperation with other programmers, it also presents more opportunities for disagreement. Many C developers are horrified by the "don't just stand there, ship something" ethic that many Visual Basic (VB) shops live by. In contrast, many VB developers are amused by the syntactic preoccupation that many C programmers exhibit as they write yet another template-based generic wrapper to a three-function API. The culture clash is only exacerbated by the varying degrees of COM support each programming environment offers. C programmers often feel constricted by VB's lack of support for arbitrary pointers and arrays. VB programmers have their own list of gripes in this area related to versioning, globally unique identifier (GUID) management, and support for interfaces and events.

One of the most interesting collisions I have noticed over the years is the tension between strongly-typed systems and weakly-typed systems. C developers are trained early on to prefer strong typing. This can be traced back to Bjarne Stroustrup's (C inventor) mantra of "prefer compile-time errors to run-time errors." VB programmers tend to prefer weakly-typed systems. Perhaps this can be traced back to the BASIC language's lack of support for typed variables (manifested in today's Variant data type). Or perhaps this preference is due to the fact that Visual Basic projects typically last for a short duration and there is often little time (or need) for strong typing.

It is easy to take the puritanical software engineering stance that strong typing is superior to weak typing and condemn the VB programmer for his or her transgressions. However, many of the applications that VB programmers build are highly suited to weak typing. This is especially true for code that is written to be disposable or transient. Many applications developed by internal corporate development departments (the primary bastion of VB) are not meant to be multi-man or multi-year endeavors. Rather, some code needs to be written that satisfies an immediate business need that is often transient or volatile. A majority of today's Web development also falls into this category, as most commercial Web sites change from month to month to hold consumer attention, as well as to take advantage of newer Web technologies (for example, Dynamic HTML (or DHTML)).

VB programmers have never expressed their preference for weak typing more so than in their utilization of (and affection for) the ActiveX Data Objects (ADO) Recordset. The ADO Recordset is a generic, extensible, self-describing data structure that most VB programmers have come to depend on as much as VB itself. While the ADO Recordset was originally designed to present an API to databases, it has evolved into a generic data structure that is useful even when no databases are in use. It is extremely common for VB programmers to define their component interfaces largely in terms of the Recordset. Because Recordsets can easily marshal by value, they are more efficient for data transfer than most other object-based solutions available to a VB programmer. Also, because the schema used by a Recordset is defined at run time, not compile time, Recordsets offer a "backdoor" for interface evolution that does not require COM-style interface versioning. This style of evolution is not without its downside, as changing a Recordset schema requires that version negotiation must be done manually at the application level. However, in some deployment scenarios, this is not a problem.

Why XML?

Many view XML as a fourth component integration technology. While originally designed as a solution for adding extensions to HTML, XML is rapidly becoming the technology of choice for integrating heterogeneous component-based systems. Here's why.

XML Is a Minimal Component Standard

Recall the four degrees of interoperation discussed earlier. XML is fundamentally about defining a minimal wire representation for data and message interchange as shown above in Figure 1. This is the minimal level of standardization needed to ensure that components can communicate. The core XML specification is extremely simple, as it only lays down the syntactic ground rules for forming valid XML messages. While the World Wide Web Consortium (W3C) is rapidly layering additional standards on top of XML (for example, XLink and XML Schemas), the base-XML syntax has been fairly stable. The base XML syntax has proven to be quite flexible and adaptable to many applications, and despite its hierarchical nature, XML lends itself reasonably well to non-hierarchical data types.

XML does not mandate a type information representation per se. XML provides a standard mechanism for describing XML data streams known as document type definitions, or DTDs. While DTDs can be useful for building validating XML parsers, they have suffered from several problems. For one, DTDs are extremely difficult to read or author, because the syntax of a DTD is not XML, but rather a DTD-specific grammar that is similar to (but still different than) XML. Because DTDs are not valid XML themselves, infrastructure and tool vendors need to develop two sets of parsers, editors, and APIs—one set for XML, and one set for DTDs. Worse yet, DTDs have a hard time dealing with scoping and namespaces, which makes them unusable in many interesting scenarios. Currently, many XML-based systems simply define their own type information representations as XML vocabularies. The XML Data specification submitted by Microsoft and others is one example of such a vocabulary.

Component Integration Technologies

  XML COM Java CORBA
In-Memory Interoperation W3C DOM (recommendation only), Simple API for XML (SAX), etc. The COM API The Java Programming Language The POA and ORB object interfaces
Text-based Type Information Interoperation DTDs (legacy)
XML Schemas/XML Data (future)
COM IDL The Java Programming Language OMG IDL
Binary Type Information Interoperation Same as text-based type info Type Libraries .class files None
API-level Type Information Interoperation None For DTDs; DTD replacement will just be XML, so any XML parser will work LoadTypeLib, ItypeLib, et al java.lang.reflect Interface Repository
Wire Interoperation XML (over HTTP, raw TCP, or message-based protocols) DCOM (DCE based) over raw TCP, SPX, etc. RMI/JRMP or RMI/IIOP or RMI/HTTP IIOP over raw TCP

XML makes no attempt to address in-memory component or object representations (other than the fact that XML data streams can be read into memory prior to parsing). While the W3C is currently working on an API recommendation known as the Document Object Model (or DOM), this API is only a recommendation and not required to host XML in an application or system.

XML Is Platform, Language, and Vendor Agnostic

Despite the hopes of platform vendors or open-source zealots, the computing world will always be comprised of different programming languages, operating systems, and computing hardware. As XML is only a wire representation, it has no particular affinity to one operating system, programming language, or hardware architecture. As long as two systems can exchange XML messages, they can potentially interoperate despite their differences. Because XML does not mandate an API or in-memory representation, it is fairly simple to host XML in an application. There are XML parsers freely available for most (if not all) programming languages. While there are several standardized programmatic interfaces for parsing XML (for example, the W3C, DOM, and SAX), there is no mandate that one must support that API in order to interoperate with other XML-based systems.

XML is Accessible

XML is incredibly easy to understand. It's easy to read and easy to author. This accessibility has been key to XML's rapid acceptance. Unlike binary-wire protocols like DCOM, CORBA, or Java/RMI, one can easily create XML messages using a simple text editor or scripting language. While many XML parsers provide facilities for generating well-formed XML, it is also possible generate XML using standard string manipulation facilities in your programming language of choice. The simple text-based nature of XML also makes it easier to debug and monitor distributed applications, as all component-to-component messages are readable to us when using a network monitoring tool.

XML is Extensible

A system that is not extensible is doomed to failure at worst or random hackery at best. XML provides a fairly elegant mechanism for allowing arbitrary parties to extend a given XML data stream. XML namespaces leverage the Uniform Resource Identifier (URI) namespace to allow arbitrary attributes and elements to be added to an existing XML vocabulary. For example, consider the following simple XML fragment:

<order orderno="33512">
    <customer custno="4462" />
    <item itemno="3352" />
    <item itemno="1829" />
</order>

This fragment indicates that customer number 4462 is ordering items 3352 and 1829. Assuming both the sender and receiver of this fragment understand what this means, everything is great. However, what if the sender of the message wanted to annotate this message with additional information (for example, adding an identifier to the order that associates it with a larger financial transaction)? One could imagine the sender simply adding the attribute as follows:

<order orderno="33512" transid="55291">
    <customer custno="4462" />
    <item itemno="3352" />
    <item itemno="1829" />
</order>

However, because the receiver of the message may have been developed independently from the sending application, there are several potential problems. For one, the receiver may or may not allow additional attributes or elements to be added to a message. If the receiver interprets the presence of this attribute as a parsing error, then the request will fail. To deal with this problem, newer XML description technologies (such as Microsoft XML Data) allow XML vocabularies to be defined as either open or closed. A closed vocabulary cannot be extended beyond what is described in the base vocabulary schema. An open vocabulary can be extended, with the receiving application determining how to interpret extended elements and attributes. Depending on the application, unrecognized extensions to a vocabulary can often be ignored.

Assuming that the order message shown earlier is part of an open XML vocabulary, it should be safe to add the transid attribute. However, what if the receiver of the request also wanted to extend the vocabulary? In particular, what if the receiver had defined a new attribute for associating orders with low-level database transactions? If the receiver mistakenly chose transid as the attribute name, then the sender's financial transaction ID would be misinterpreted as a low-level database transaction ID. To solve this problem, the W3C added namespaces to XML.

XML namespaces allow attributes and elements to be scoped by a URI. The following XML fragment illustrates how XML namespaces can be used to unambiguously add the transid attribute to the order request:

<order orderno="33512"
       xmlns:fin="https://money.org/FinancialXML/ns"
       fin:transid="55291" >
    <customer custno="4462" />
    <item itemno="3352" />
    <item itemno="1829" />
</order>

When a receiver parses this XML fragment, it can detect that the transid attribute is scoped by the namespace https://money.org/FinancialXML/ns and is not the same as the transid attribute used to represent database transactions (which would have a different namespace URI). In fact, XML namespaces allow both transid attributes to appear in the same request unambiguously:

<order orderno="33512"
       xmlns:fin="https://money.org/FinancialXML/ns"
       xmlns:db="urn:xmltpc:XMLTransactions"
       fin:transid="55291"
       db:transid="46722" >
   <customer custno="4462" />
    <item itemno="3352" />
    <item itemno="1829" />
</order>

Despite the current energy being dedicated to XML-based type description, XML namespaces are arguably the most enabling enhancement to base XML that has come out of the W3C to date.

XML Is As Strongly Typed As You Want It To Be

Due to the use of open vocabularies and namespaces, XML can support weakly-typed communications. While strong typing has many benefits (and is supported by XML using DTDs or their equivalents), it is extremely easy to build weakly-typed systems using XML. This makes XML extremely adaptable to generic application frameworks, data-driven applications, and rapid development scenarios (for example, disposable or transient Web-based applications). Along these lines, many ADO Recordset aficionados are using the Microsoft XML parser (MSXML) to replace the Recordset as a data transfer mechanism, both for its cross-platform benefits as well as its superior support for non-tabular data.

XML Can Solve Its Own Interoperability Problem

Simply adopting XML as a component integration technology does not completely solve the interoperability problem. In particular, even though much of the industry is embracing XML as an interoperability technology, this only pushes the interoperability problem up one level of abstraction. Even if the entire industry were shift to XML overnight, this alone would not help, as different organizations are likely to use different XML vocabularies to represent the exact same information. Granted, there are currently industry-wide initiatives to standardize domain-specific XML vocabularies (for example, BizTalk, FinXML, and OASIS); however, it is not known whether any of these efforts will achieve 100 percent penetration in a particular application domain.

Fortunately, the lack of standardized vocabularies can be solved using XML technology. In particular, in the presence of two competing vocabularies, it is likely that application-level gateways will transform requests from vocabulary "A" into requests in vocabulary "B." An even more promising solution lies in XML transforms. XML transforms allow one XML vocabulary to be transformed into another by specifying the transformation rules (in XML of course). XML transforms were originally devised to map XML to HTML, but are currently being applied in a variety of much more interesting scenarios.

Summary

Each year or so, the computer industry anoints a new technology as the "holy grail" of software development. The trade press happily bangs the drum, encouraging upper-management to hand down edicts outlining grand technology visions according to the pundit du jour. XML is bound to fall prey to this nonsense.

Despite the hype, XML will not solve all of your problems. XML may or may not help you ship software faster. XML will never replace programming languages such as C or Java. XML will probably never replace programming technologies such as COM or Java either. XML will, however, become widely used as a way for software components to interoperate, in essence acting as a gateway between autonomous, heterogeneous systems. It is in this role that XML really excels.

Additional Resources