Share via


From the February 2002 issue of MSDN Magazine

MSDN Magazine

The Continuing Challenges of XML Web Services

Don Box

I n spending time with literally thousands of developers over the past eight years, I have noticed that there are two types of developers out in the wild: those who prefer to work at start-ups and those who prefer to work at large and established companies. Having just spent eight years building a start-up into an established company, I can't honestly say which type of organization I prefer. I can state wholeheartedly that in terms of the technologies I work with, I definitely prefer "start-up" technologies to those technologies that are considered to be well established.
      Technologically speaking, I am a textbook entrepreneur. Once a technology stabilizes, I want to move onto the next frontier. Ultimately, that's why I stopped doing serious COM work. That's also why the Common Language Runtime (CLR) doesn't do it for me. Don't get me wrong—I love using the CLR. After exploring almost every nook and cranny I could find, I still think that the CLR is a fantastic piece of technology and recommend it to all my friends and colleagues. That stated, once Visual Studio® .NET ships, hundreds of thousands of developers will come to depend on the CLR, which means that changes and innovation will be more evolutionary than revolutionary.
      In contrast to COM and the CLR, XML is still a fairly broad canvas upon which developers get to repaint the world every six months. To some, this marginalizes XML as a trendy technology fad that will implode under the weight of its own hype and/or instability. To others, this represents an opportunity to recast a problem domain using broadly accessible tools and techniques. Needless to say, I fall firmly into the latter camp.
      Working with XML and Web Services is not without its challenges. To that end, this month's column outlines what I believe to be the biggest challenges facing XML Web Service practitioners and plumbers at the beginning of 2002.

Specification Convergence (or lack thereof)

      Because XML is a vendor-neutral technology, it is largely specification-driven, with the W3C acting as the home of all core XML specifications. When all of the specifications are stable and complete, this is a good thing, as every vendor can build plumbing against a common set of protocols, knowing that cross-vendor interoperability is at least attainable—if not certain. However, when all of the requisite specifications are not complete, developers are forced to choose between hand-rolling implementations of specifications still in progress or waiting until their platform vendor catches up. Fortunately, most (but not all) of the more complex XML technologies are fully specified, which means that implementing an occasional specification by hand is not terribly difficult.
      This column was written as 2001 came to a close. It was actually a very good year for XML specifications. By the end of 2001, the three specifications that comprise the base data model for XML had achieved full recommendation status (this is the end of the line for a W3C specification and a critical step towards stability and interoperability). These specifications are XML 1.0 second edition, the XML Information Set, and XML Schemas.
      The second edition of XML 1.0 (see https://www.w3.org/TR/REC-xml) was an editorial exercise that incorporated fixes for roughly three years worth of errata reported against the 1998 version of the specification.
      The XML Information Set (Infoset) specification (see https://www.w3.org/TR/xml-infoset/) was finally advanced to full recommendation status in 2001 after roughly two years in W3C purgatory. The Infoset describes the abstract data model that "is" an XML document. The Infoset is important because it identifies what is and is not important in an XML document. For example, the Infoset indicates that there is no semantic difference whatsoever between the following three elements:

  <customer></customer>
  
<customer/>
<customer ></customer>

 

      Most newer XML technologies are defined in terms of the Infoset, not the underlying XML 1.0 syntax. That means that very few XML technologies retain the differences between these three representations of essentially the same element.
      By far the most important of the three data model specifications is XML Schema (see https://www.w3.org/TR/xmlschema-1/). XML Schema advanced to full recommendation status in May 2001 (see Figure 1) and vendor support has been getting better each month since then. XML Schema provides a type system for the XML Infoset, making XML much more suitable for integrating software components and applications that rely on strong typing.

Figure 1 XML Schema
Figure 1 XML Schema

      XML Schema provides three fundamental features, all of which are heavily used by Web Service technology. First, XML Schema defines a type system for simple value types such as double, boolean, and so on. Because XML is inherently a text-based data representation, this type system distinguishes between the abstract value space of a type and Unicode-based lexical representations. Furthermore, XML Schema defines a strongly typed way to look at XML-based information called the post-schema-validation Infoset (PSVI). The PSVI is a type-aware version of an XML Infoset in which the type affiliation of each element and attribute is available to higher-layer technologies and applications. The following elements show why these features are important:

  <age>0000200</age>
  
<age>200.00000</age>
<age>2e2</age>

 

      At the basic Infoset level, these three elements are different since each of the elements has a different number of children (seven, nine, and three, respectively) and few of the children of one element match the children of another. Assuming the XML Schema type system and PSVI, the three elements may be considered equivalent provided that the age element is affiliated with the built-in type double.
      Despite the differences in the previous three elements, most humans would intuitively feel they are equivalent. This is because humans can often infer type based on lexical patterns. To allow XML-based software to infer type, XML Schema provides a schema definition language (often referred to as XSD) for indicating the type affiliation of elements and attributes. The schema definition language is where most developers start focusing; however, the type system and PSVI are at least as important to modern XML-based architectures.
      Though work on the XML Schema specification is now finished, by no means is the W3C done churning out new specifications. In fact, the presence of XML Schema actually makes at least one existing technology obsolete: XPath.
      XPath was defined prior to either the Infoset or XML Schema. To avoid dependencies on XML syntax, XPath defined its own data model that was similar to the Infoset. Moreover, XPath defined a very rudimentary type system of its own that had exactly four types: boolean, number, string, and node-set. Unfortunately, since XPath was defined prior to XML Schema, the type of a given attribute or simple element was always string. To treat attribute or element values as boolean or number, you needed to explicitly cast. For example, consider the following XML document:

  <add>
  
<a>200</a>
<b>200.0</b>
</add>

 

      Because XPath considers all element and attribute values to be of type string, the following XPath expression would result in the boolean value false.

     /add/a = /add/b
  

 

To perform numeric comparison, an explicit conversion is required:

     number(/add/a) = number(/add/b)
  

 

If XPath had been written against the PSVI, the conversion would not be necessary. To address this, version 2.0 of XPath is now being defined in terms of PSVI. Because XSLT is a heavy user of XPath, XSLT 2.0 will also gain type-awareness. Unfortunately, XPath 2.0 and XSLT 2.0 won't be complete until at least 2002.
      XSLT is one way to process XML-based information. However, it is by no means the only way. One emerging technology that provides an alternative to XSLT is XML Query. XML Query has become the new hot technology in core XML, and the XML Query working group is producing working drafts faster than most developers can read them. Like XSLT, XML Query allows you to specify literal result elements that are annotated with processing hints. However, XML Query uses more familiar SQL-like constructs rather than XPath expressions. For example, the following XSLT fragment

  <xsl:template name="bob">
  
<bib>
<xsl:for-each select=
'document("books.xml")/bib/book[pub="Wiley"]'>
<book year="{@year}">
<xsl:copy select="title" />
</book>
</xsl:for-each>
</bib>
</xsl:template>

 

could be rewritten in XML Query as follows:

  <bib> 
  
{
FOR $b IN document("books.xml")/bib/book
WHERE $b/pub = "Wiley"
RETURN
<book year={ $b/@year }>
{ $b/title }
</book>
}
</bib>

 

Note that the XML Query processing model remotely resembles a SQL SELECT statement.
      While some Web Services will be written exclusively in XSLT or XML Query (especially once the specifications are finished), many existing implementations use alternative, ad hoc implementation techniques. This is a feature, as having one true way to build XML-based applications is still far off. However, for these implementations to interoperate, some common metadata format is needed to describe the operations that can be performed on a Web Service. It's WSDL to the rescue.
      WSDL, or Web Services Description Language, augments the XML Schema language by defining operations as typed message exchanges. Describing the format of the messages to be exchanged is a solved problem, service XML Schema gives you a perfectly reasonable way to describe XML-based information. WSDL's contribution to XML Schema is that it defines a type system for typed endpoints or services. These endpoints/services implement one or more port types. A port type consists of one or more operations. An operation is a named pair of messages, one containing the input to the operation and the other containing the results. WSDL operations correspond roughly to a CLR method; WSDL port types correspond roughly to a CLR interface. A WSDL service is not the equivalent of a CLR class—rather, the closest analogue in the CLR is a named singleton.
      Conceptually, WSDL is a more than reasonable solution for describing Web Services. However, at the time of this writing, the current version of WSDL is version 1.1. WSDL/1.1 is simply a W3C note and has yet to endure the W3C process. Additionally, field experience with WSDL has revealed a number of issues that need to be addressed in order to achieve compliance with other technologies, most notably XML Schema. Again, these two factors illustrate how specification churn is far from over.
      The technologies just discussed are all at some stage in the W3C process, which means that you can track their progress and participate in their final form. Beyond these technologies, there are hosts of vendor-specific technologies that are being churned out by vendors of all sizes and shapes, including Microsoft. While some of these technologies may eventually be submitted to the W3C or other standards bodies, they are far from final, which means that adapting to the current state-of-the-art may require that you write significant parts of the plumbing yourself until the vendors can completely hide the details in their platforms.

Battle for World (Type System) Domination

      The Simple Object Access Protocol (SOAP) was defined prior to the advent of XML Schemas. The original goal of SOAP was to build a Web-friendly network protocol for integrating components over the Internet. XML was becoming the data format of choice for Web-based applications, so SOAP used XML as a marshaling format for in-memory object graphs and C-style data structures that would act as requests or response messages between the various components.
      Because XML Schemas did not exist when SOAP was first developed, the original SOAP authors defined a type system for representing strongly typed information in XML. That type system evolved considerably over the three years of SOAP's life prior to W3C, eventually becoming known as section 5 due to its location in the SOAP/1.1 specification (https://www.w3.org/TR/SOAP/). SOAP section 5 had one simple charter: codify how to serialize strongly typed data structures into XML. The strongly typed data structures the authors had in mind were primarily object graphs such as those found in the CLR or the Java VM. If all you care about is serializing CLR or Java object graphs into XML, SOAP section 5 provides a concise format that retains full fidelity with both the CLR and Java type systems.
      SOAP section 5 assumed an object-centric view of the world. In particular, it assumed that developers couldn't care less about the underlying XML format being used. Rather, SOAP section 5 assumed that most implementations would derive the XML format based solely on the type definitions used by the "local" programming technology (such as CLR or Java). This world view reduces XML to second-class status, and may or may not be the dominant world view in two to three years. In fact, there are a growing number of developers who believe that XML, not objects, will be the dominant type system for Web Services. This world view deprecates objects as an implementation detail and elevates XML Schema as the dominant type system for all messages. This XML-centric approach has little use for SOAP section 5, instead choosing the broader XML Schema data model for use in all message exchanges.
      Nowhere is the tension between the two worlds more evident than in the Microsoft® .NET Framework. The .NET Framework includes two XML serialization engines: System.Runtime.Serialization and System.Xml.Serialization. The former assumes that the CLR type system is the dominant system, as shown in Figure 2. System.Runtime.Serialization is arguably the most complete and correct implementation of SOAP section 5 on the planet; however, it cannot cope with arbitrary XML Schema types.

Figure 2 System.Runtime.Serialization
Figure 2 System.Runtime.Serialization

      In contrast, System.Xml.Serialization assumes that the XML Schema type system is the dominant system, as shown in Figure 3. System.Xml.Serialization can handle a much broader range of XML Schema constructs than System.Runtime.Serialization; however, it cannot handle arbitrary CLR objects, especially those containing object references.

Figure 3 System.XML.Serialization
Figure 3 System.XML.Serialization

      The two serializers exist to service the two world views just described. Developers with an object-centric viewpoint will definitely use System.Runtime.Serialization. Developers with an XML-centric view would prefer System.Xml.Serialization. However, XML purists are likely to eschew any serialization engine, since they are bound to have fidelity loss due to the inherent differences between the CLR type system and the PSVI.

Messages versus RPC

      This debate is often confused with the previous discussion. To be clear, Web Services are always built based on the exchange of messages. The type system of those messages may be object-based (a la SOAP section 5) or XML Schema-based. Once the type system of the messages is decided, the next battle is whether message exchanges and remote procedure calls (RPCs) are the same.
      The messaging versus RPC debate has gone on for decades and shows no sign of stopping anytime soon. The arguments on both sides can be reduced to distinguishing the atoms from the molecules. The messaging view of the world defines the message as the atom and elevates messages to first-class status. In a messaging-oriented world, RPC is simply a particular message exchange pattern in which a request message triggers the generation of a response message destined for the sender of the original request. Message purists tout that this is but one message exchange pattern and that by viewing the world in terms of RPC, no other patterns are obvious or even possible.
      The RPC-centric view acknowledges the existence of messages, but tends to view the overall operation as the atom. To support some of the flexibility of messaging, most modern RPC systems support asynchronous invocation and one-way operations. Asynchronous invocation is largely a local language binding issue and has little to no impact on the messages exchanged on the wire. In contrast, one-way operations have no response message. RPC purists argue that any system designed around a messaging paradigm could be just as easily designed around one-way RPC calls.
      An interesting technology that lives at the boundaries of this debate is the CLR's transparent proxy. The transparent proxy exists for one purpose: to convert a method call into a message exchange. The transparent proxy is an object in memory that supports a particular CLR type. When a method is invoked against the transparent proxy, the CLR converts the call stack into a request message. This request message is then dispatched to a buddy object (known as the real proxy). The real proxy is expected to return a second message containing the results of the operation. When the real proxy returns the result message, the CLR ensures that the results are reflected on the call stack presented to the transparent proxy. As shown in Figure 4, the CLR provides a similar facility for converting a request message back into a stack frame in order to dispatch the call generically to a target object. The transparent proxy facility is used by at least one Web Service implementation that ships with the .NET Framework.

Figure 4 Transparent Proxy
Figure 4 Transparent Proxy

      SOAP and Web Services have simply resuscitated the messaging versus RPC debate. Whether or not the debate will be resolved remains to be seen. However, the fact that the underlying message format is XML allows a larger number of developers to experiment with both approaches and come to their own conclusions.

Libraries versus Languages

      Making XML accessible to programmers is actually a very big challenge. To date, there have been two fundamental approaches: libraries and languages. Library-based approaches are based on a class library or framework that is written in a traditional programming language for use from a traditional programming language. Examples of library-based approaches include SAX, Apache's Xerces and Cocoon, and the .NET Framework's System.Xml library. Language-based approaches typically coin a new programming language designed to process XML-based data. XSLT and XML Query are examples of the language approach.
      Both library and language-based approaches have their pitfalls. Library-based approaches typically need to address the mismatch between the XML type system and the type system of the underlying language or programming environment. Since many Web Services are expected to use other Web Services, this approach can result in a significant amount of type conversion as the response from one service is converted from XML into local types only to then emit XML as the upstream response. Additionally, because library-based approaches typically have little integration with the type system of the host programming language, many errors that could be caught at compile time become runtime errors.
      Proponents of language-based approaches believe that by coupling the programming language to the underlying XML type system, not only can errors be discovered at compile time, but also more information is available to produce better code. The downsides of a language-based approach are numerous. For one, programming languages are far more personal than libraries; every developer has their own sense of aesthetics that need to be addressed. Additionally, producing a new language implies the development of a new compiler and debugger, both of which are fairly significant tasks.
      The language versus library debate is analogous to the object versus XML debate. Developers who view the world as object-centric are likely to prefer library-based solutions. Developers who have an XML-centric view are likely to prefer a language-based solution. Once XML Query and XSLT 2.0 are finalized, developers will have a clear choice between the two.

Poor Support for Streaming Architectures

      The DOM has held back XML development more than any other technology I know of. The DOM is an in-memory cache for XML documents. The DOM is intuitive. The DOM is easy to use. The DOM lets you forget that the underlying XML is likely coming from somewhere else, and that is its fundamental problem.
      XML-based applications such as Web Services are very I/O bound. Web Service requests and responses are I/O. Calls to databases or subordinate Web Services are also I/O. The DOM ignores all of this. To DOM-based applications, I/O is an afterthought that happens before and/or after the interesting processing takes place.
      The argument against the DOM is similar to the argument against static cursors in ADO. In both cases, an in-memory cache must be populated in order to work with the underlying data. In both cases, the entire image winds up consuming memory by the time the I/O is complete, which can limit the scalability of server-side applications. In both cases, there is an asynchronous read/load operation, but few developers take advantage of it, largely due to COM threading weirdness.
      The trend in the last two years has been to forgo the DOM in favor of streaming interfaces. SAX is the canonical example of a streaming interface. The .NET Framework's XmlReader is a variation on SAX that also is based on streaming. Streaming interfaces are analogous to forward-only/read-only firehose mode data access techniques. In both cases, the underlying plumbing only retains as much information as is needed to satisfy the next read request. Once a portion of the data has been read, the underlying plumbing can freely discard the buffered information. This can have a tremendous impact on resource consumption, especially for large chunks of data.
      Despite the popularity of SAX and XmlReader, the entire world hasn't jumped on the streaming bandwagon. In particular, technologies such as XML Query and XPath (and, by inference, XSLT) support backward traversal, which makes using these technologies in streaming contexts difficult. It remains to be seen if the W3C has the cycles or the will to develop a streaming-friendly processing model and language in the mold of XSLT or XML Query. Until that happens, developers working with large volumes of XML need to do a significant amount of manual labor to avoid killing performance.

Three Views of Discovery

      First, let's set the record straight. XML is not self-describing. If an arbitrary XML document falls from the sky into your lap, there is very little you can do with it. The fact that the element and attribute names are part of the serialized form does not make XML self-describing. If it did, then I challenge you to tell me what the following document means:

  <draw />
  

 

Without context, element and attribute names are meaningless.
      A lot of attention is paid to discovery. Unfortunately, the term itself is vague. In terms of Web Services, there are ultimately three forms of discovery: type discovery, endpoint discovery, and semantic discovery. Let's look at each of these individually.
      Type discovery is a development time phenomenon. As a developer is writing a program, the types that will be used by the program need to be accessible to the developer as well as to the underlying build environment. Discovering these types means finding the metadata for the types. In the world of Web Services, WSDL and XSD are the metadata formats of choice. Web Service-aware build environments can consume WSDL and/or XSD and make the underlying types available to your program.
      Type discovery focuses on where the WSDL/XSD can be found at development time. Finding WSDL/XSD is like finding C header files or COM type libraries. If someone gives you a file and says "here's what my stuff looks like—party on," then you are set. Of course, with WSDL/XSD, the file is likely to be a URL that points to a program that generates the metadata on the fly, but ultimately, as far as your build environment is concerned, it's just another file.
      There are various conventions for finding WSDL/XSD in cases where you do not already have a URL to a Web Service's metadata. Web Services written using the .NET Framework all support the ?WSDL query string, which tells the plumbing to emit the WSDL for the Web Service. Other specifications such as DISCO and WS-Inspection allow you to send a well-known request to the root URL of a Web server and get a list of all WSDL documents available from that server. For the lion's share of type discovery applications, this is more than sufficient.
      The second form of discovery that bears discussion is endpoint discovery. In general, it is dangerous to bake the endpoint address of a Web Service into your application. Between dot-com failures, DNS hijacking, and general deployment flexibility, placing a level of indirection between your program and the Web Service address is generally a good thing. This is the role of endpoint discovery.
      Endpoint discovery typically happens at deployment time, when an application is installed. Endpoint discovery can also happen at runtime, either as the application initializes or, in the face of Web Service failure, when a different server machine must be selected. It typically focuses on finding new implementations of known port types. This is similar to finding classes that support a given interface. WSDL allows multiple services (endpoints) to implement a given port type, which certainly makes this sort of polymorphism possible. However, the WSDL type system does not distinguish between endpoint address and implementation type, which means that not all implementations of a port type are semantically interchangeable. However, in constrained scenarios, WSDL-based solutions can be made to work in this context.
      In a WSDL-based world, endpoint discovery is largely a matter of finding implementations of a known port type. At the time of this writing, both WS-Inspection and UDDI version 2 had rudimentary support for this style of discovery, although that style of discovery is not the focus of either technology.
      Finally, the third form of discovery to be discussed is semantic discovery. Conveying meaning in machine-readable form is an open research area that has yet to produce technically and commercially viable solutions. At the time of this writing, semantics are conveyed in human-readable form (that is, documentation) and programmers are paid to understand the semantics of a given piece of code or type and use those semantics in their own programs. It is important that type discovery not be confused with semantic discovery. Types do not convey semantics—rather, they give us a name that can be used in both programs and documentation. Both WSDL and XSD provide well-known locations to provide semantic descriptions, however, in both technologies, these descriptions are simply human-readable prose not meant for machine interpretation. For the near-term future, that is as close to semantic discovery as we are likely to get.

Parsers

      XML's greatest strength is often its greatest weakness. The fact that XML is a text-based data representation that can be easily manipulated using text editors such as Notepad or Emacs makes it immediate and accessible. Unfortunately, to make XML authorable by a wide variety of individuals and tools, the XML 1.0 specification is filled with details and special-case exceptions that make the efficient parsing of XML challenging, to say the least. To keep parser writers honest, OASIS publishes a test suite of XML documents that are used for conformance testing that contain many of these special-case exceptions. The art of parser development is passing the OASIS test suite without totally killing performance.
      Frustrated with the complexity of parser development, Don Park and Simon St. Laurent proposed a subset of XML called Simple Markup Language (SML). SML was a proper subset of XML—that is, all SML documents were legal XML, but not all XML documents are legal SML. SML was developed to make parser implementation approachable to mere mortals. SML tried to identify the subset of XML that was needed for software-generated XML, not human-generated XML.
      SML was an ad hoc effort that began on the XML-DEV mailing list. Canonical XML (see https://www.w3.org/TR/xml-c14n) is a more formal effort that originated from the W3C. Canonical XML is a subset that was designed to eliminate many of the redundancies in XML 1.0. Many (but not all) of the aspects of SML are present in Canonical XML. Canonical XML's primary advantage is that it is "blessed" by the W3C and is likely to gain widespread support.
      A side effect of subsetting XML with Canonical XML is that parsers (and processing in general) can go considerably faster when certain XML "features" are assumed to be absent. Consider finding an element with an attribute named moniker whose value is Don. You could use simple string matching via strstr:

  const char *FindDon(const char *pszDoc) {
  
const char *pszPattern = "moniker=\"Don\"";
const char *psz = strstr(pszDoc, pszPattern);
if (psz) {
psz = strrchr(psz, '<
}
return psz;
}

 

Unfortunately, this technique would miss the following elements

  <dude moniker='Don' />
  
<dude moniker="D&#x6F;n" />

 

both of which satisfy the condition of having a moniker attribute whose value is Don. Had the data been encoded as Canonical XML, neither of these two elements would have been legal, as their Canonical XML encoding would have looked like this:

  <dude moniker="Don" />
  

 

      It remains to be seen whether XML parsers will be tuned to Canonical XML, and if they are, how much performance benefit could be gained by the simplification of the underlying format.
      In many respects, Canonical XML is similar to the Infoset in that both identify what is and is not important in an XML document. The difference is that Canonical XML is defined simply by subsetting XML 1.0; that is, Canonical XML is itself a machine-readable format, not simply an abstract data model.
      A related area of research is alternate encodings for XML-based information that are not text-based. The first such effort was WBXML, which is used by WAP to send XML-based information to cellular phones. WBXML suffers from coming too early in the development of XML, and is not broadly used outside of the wireless world. Again, discussions on the XML-DEV mailing list recently steered towards defining a new binary format for XML-based information. Whether or not the W3C will pick up such an effort remains to be seen. At the very least, this technique is used extensively today within controlled environments, such as when XML-based information is passed from one component to another in memory. In these scenarios, XML-based information is usually passed via a SAX pipeline or a DOM rather than as an array of octets that must be reparsed using an XML parser. Evolving this approach to an in-memory data representation is not that far fetched and would likely yield performance benefits by eliminating the massive number of virtual method calls inherent in SAX and DOM-based systems.

Lack of Unified Authentication

      No discussion of the challenges of Web Services would be complete without a discussion of authentication. At the time of this writing, there is no single authentication technique that is supported by every major vendor. Microsoft has one solution, Passport, which is likely to be supported across all Microsoft-based Web Services and plumbing. However, there are numerous other solutions being pitched by competing vendors and open source advocates. Unfortunately, in the absence of a universal and ubiquitous authentication technique, developers fall back on proprietary ad hoc solutions. Many hand-rolled authentication techniques are insecure and easily hacked, making the need for a universal authentication mechanism all the more pressing.
Send questions and comments for Don to housews@microsoft.com.**

Don Box spends most of his time working with component technologies, lately focusing on XML and Web Services. Don's latest book, Essential .NET, is due out this year from Addison-Wesley.