XML Glossary

2011-02-21

This glossary defines terms that pertain to XML standards.

A

attribute
An XML structural construct. A name-value pair, separated by an equals sign, included inside a tagged element that modifies certain features of the element. All attribute values, including things like size and width, are in fact text strings and not numbers. For XML, all values must be enclosed in quotation marks.

You can declare attributes for an XML element type using an attribute list declaration.

C

Cascading Style Sheets (CSS)
Formatting descriptions that provide augmented control over presentation and layout of HTML and XML elements. CSS can be used for describing the formatting behavior of simply structured XML documents, but does not provide a display structure that deviates from the structure of the source data. See also Extensible Stylesheet Language.

CDF
See Channel Definition Format (CDF).

Channel Definition Format (CDF)
An XML-based data format used in Microsoft® Internet Explorer 4.0 and later to describe Active Channel™ content and desktop components.

CDF permits a Web publisher to offer frequently updated collections of information, or channels, enabling automatic delivery to compatible Web clients. The user only needs to choose the channel once, and scheduled deliveries of the channel information will be delivered to the client without further intervention.

character data
All the textual content of an element or attribute that is not markup. XML differentiates this plain text from binary data. In the XML OM, character data is stored in text nodes, which are implemented as DOM text objects.

complex data type
An element that can contain other elements or attributes. Also known as complex type. Appears as <complexType> in XML documents.

CSS
See Cascading Style Sheets (CSS).

D

data island
An XML document (<XML> or <SCRIPT language="XML">). that exists within an HTML page. It allows you to script against the XML document without having to load it through script or through the <OBJECT> tag. Almost anything that can be in a well-formed XML document can be inside a data island.

HTML is used as the primary document or display format, and XML is used to embed data within the document.

Data Source Object
Provides a way to bind HTML controls directly to an XML data island. It assists developers in connecting to structured XML data and supplying it to an HTML page by using the data-binding facility of dynamic HTML.

XML Data Source Object allows you to work with data one node at a time, but you can also work with multiple nodes at a time, without having to walk the document tree. It binds the data to specific controls on the page and the controls are automatically populated with data from the Data Source Object.

data types
The parts and subparts of an XML schema that are used as the basis of all the larger components in schema.

definition
A description used to create simple and complex data types.

document element
The element in an XML document that contains all other elements. It is the top-level element of an XML document and must be the first element in the document. There is exactly one document element, no part of which appears in the content of any other element. The document element represents the document as a whole; every other element represents a component of the document.

The terms root element and document element are interchangeable.

document entity
The starting point for an XML parser. Unlike other entities, the document entity has no name and cannot be referenced. It is the entity in which the XML declaration and document type declaration can occur.

Document Object Model (DOM)
A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them. Vendors can support the DOM as an interface to their proprietary data structures and APIs, and content authors can write to the standard DOM interfaces rather than product-specific APIs, thus increasing interoperability on the Web.

document type declaration
An XML structural construct. Consists of markup code that indicates the grammar rules, or Document Type Definition (DTD), for the particular class of document. The document type declaration can also point to an external file that contains all or part of the DTD. It must appear following the XML declaration and preceding the document element. The syntax of the document type declaration is <!DOCTYPE content>.

Document Type Definition (DTD)
Can accompany a document, essentially defining the rules of the document, such as which elements are present and the structural relationship between the elements. It defines what tags can go in your document, what tags can contain other tags, the number and sequence of the tags, the attributes your tags can have, and optionally, the values those attributes can have.

DTDs help to validate the data when the receiving application does not have a built-in description of the incoming data. The DTD is declared within the document type declaration production of the XML file. With XML, however, DTDs are optional.

See also schema.

DOM
See Document Object Model.

DTD
See Document Type Definition.

E

EDI
See Electronic Data Interchange.

Electronic Data Interchange (EDI)
An existing format used to exchange data and support transactions. EDI transactions can be conducted only between sites that have been specifically set up with compatible systems. Proprietary EDI formats are more difficult to write than XML, and unlike XML, cannot be transmitted over HTTP.

element
An XML structural construct. An XML element consists of a start tag, an end tag, and the information between the tags, which is often referred to as the contents. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value. An instance of an element is declared using <element> tags.

Elements used in an XML file are described by a DTD or schema, either of which can provide a description of the structure of the data.

entity
An XML structural construct. A file, database record, or another item that contains data. The primary purpose of an entity is to hold content — not structure, rules, or grammar. Each entity is identified by a unique name and contains its own content, from a single character inside the document to a large file that exists outside the document. The function of an XML entity is similar to that of a macro definition.

The entity can be referred to by an entity reference to insert the entity's contents into the tree at that point. Entity declarations occur in the DTD.

entity reference
An XML structural construct. Acts as a placeholder for the content author, and the XML parser places the actual content at each reference site. To include an entity reference, you first insert an ampersand (&) and then enter the entity name followed by a semicolon (;), as follows: &YourEntityName;. Then, when the line is processed, the entity will be replaced with the entity's content.

It is used in much the same way as a macro.

Extensible Markup Language (XML)
A subset of SGML that is optimized for delivery over the Web, XML provides a uniform method for describing and exchanging structured data that is independent of applications or vendors.

The key is that with XML, the information is in the document, while the rendering instructions are elsewhere. In other words, content and presentation are separate. XML is the Web's language for data interchange and HTML is the Web's language for rendering.

At the time of this writing, XML 1.0 is a Worldwide Web Consortium Recommendation, which means that it is in the final stage of the approval process.

Extensible Stylesheet Language (XSL)
A language used to transform XML-based data into HTML or other presentation formats, for display in a Web browser. The transformation of XML into formats, such as HTML, is done in a declarative way, making it often easier and more accessible than through scripting. In addition, XSL uses XML as its syntax, freeing XML authors from having to learn another markup language.

In contrast to CSS, which "decorates" the XML tree with formatting properties, XSL transforms the XML tree into a new tree (the HTML), allowing extensive reordering, generated text, and calculations — all without modification to the XML source. The source can be maintained from the perspective of "pure content" and can simultaneously be delivered to different channels or target audiences by just switching style sheets.

XSL consists of two parts, a vocabulary for transformation and the XSL Formatting Objects.

F

facet
A restriction on a data type. A single defining aspect of value space. There are two types of facets: fundamental and constraining.

I

infoset
See XML information set.

invalid document
Documents that do not follow the XML tag rules. If a document has a DTD or schema, and it doesn't follow the rules defined in its DTD or schema, that document is invalid as well.

M

mixed content
Element types with mixed content are allowed to hold either character data alone or character data interspersed with child elements. In this case, the types of the child elements can be constrained, but not their order or their number of occurrences.

N

namespace
A mechanism that allows developers to uniquely qualify the element names and relationships and to make these names recognizable. By doing so, they can avoid name collisions on elements that have the same name but are defined in different vocabularies. They allow tags from multiple namespaces to be mixed, which is essential if data is coming from multiple sources. Namespaces ensure that element names do not conflict, and clarify who defined which term.

A namespace identifies an XML vocabulary defined within a URN. An attribute on an element, attribute, or entity reference associates a short name with the URN that defines the namespace; that short name is then used as a prefix to the element, attribute, or entity reference name to uniquely identify the namespace. Namespace references have scope. All child nodes beneath the node that specifies the namespace inherit that namespace. This allows nonqualified names to use the default namespace. See also RDF namespace.

NCName
An XML name that does not contain a colon (:). An NCName begins with either a letter or an underscore (_) character, followed by any combination of letters, digits, accents, diacritical marks, periods (.), hyphens (-), and underscores (_) permitted in the XML specification. The following list shows some example NCNames:

x

_aaabbb.ccc

catalog

part-number

_-._-...

notation
Tells the parser what type of object is being referenced. Usually refers to a data format of non-XML data, such as BMP. A notation identifies by name the format of unparsed entities, the format of elements that bear a notation attribute, or the application to which a processing instruction is addressed.

notation declaration
Tells the parser how to deal with a specific binary file type, as well as provides a name and an external identifier for a notation.

The notation declaration gives an internal name to an existing notation so that it can be referred to in attribute list declarations, unparsed entity declarations, and processing instructions.

The external identifier is used for the notation, which can allow an XML parser or its client application to locate a helper application capable of processing data in the given notation.

P

parsed entity
An entity that has content that is parsed and replaced with actual literal values. The result is called the replacement text. Parsed entities can only contain character data or XML markup.

processing instruction
An XML construct that conveys information to the application processing the XML. A processing instruction is a mechanism for embedding information in a file that is intended for proprietary applications. The application processing the XML can take specific action based on processing instructions. No entities are expanded within a processing instruction.

The following is a processing instruction that indicates that the XML file is a Microsoft Word XML document:

<?mso-application progid="Word.Document"?>

Q

QName
A representation of an XML qualified name. A QName consists of a namespace, represented by a namespace prefix, and a local name. For a QName to be valid, a namespace declaration must be in scope for the context in which the QName is used. For example, if a namespace declaration, such as xmlns:aw=”www.adventure-works.com”, is in scope, then an element can be declared, <aw:Root/>. For this element, aw:Root is the QName.

R

reference node
The reference node for a search context is the node that is the immediate parent of all nodes in the search context. Every search context has an associated reference node.

replacement text
The content of parsed entities, after replacement of character references and parameter-entity references.

S

SAX
See Simple API for XML.

schema
A formal specification of element names that indicates which elements are allowed in an XML document, and in what combinations. It also defines the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements. It defines whether an element is empty or can include text. The schema can also define default values for attributes.

A schema is functionally equivalent to a DTD, but is written in XML. A schema also provides for extended functionality such as data typing, inheritance, and presentation rules. Consequently, the new schema languages are far more powerful than DTDs.

schema structures
The compounds that can be constructed from data types and are used to describe the element, attribute, and validation structure of a document type.

SGML
See Standard Generalized Markup Language.

Simple API for XML (SAX)
An XML API that allows developers to take advantage of event-driven XML parsing. Unlike the DOM specification, SAX doesn't require the entire XML file to be loaded into memory. SAX notifies you when certain events happen as it parses your document. When you respond to an event, any data you don't specifically store is discarded. If your document is very large, using SAX will save significant amounts of memory when compared to using DOM. This is especially true if you only need a few elements in a large document.

simple data type
An element that contain only text. Also known as simple type. Appears as <simpleType> in XML documents. Attributes are considered simple types because they contain only text.

Simple Object Access Protocol (SOAP)
An open, extensible way for applications to communicate using XML-based messages over the Web, regardless of what operating system, object model, or language they use. SOAP provides a way to use the existing Internet infrastructure to enable applications to communicate directly with each other without being unintentionally blocked by firewalls.

SOAP
See Simple Object Access Protocol.

Standard Generalized Markup Language (SGML)
The international standard for defining descriptions of structure and content of electronic documents. Despite its name, SGML is not a language in itself, but a way of defining languages that are developed along its general principles. SGML defines the way that a markup language is built by specifying the syntax and definitions for the elements and attributes that compose it.

XML is a subset of SGML designed to deliver SGML-type information over the Web, while HTML is an application of SGML.

T

template
The basis of the XML style sheet is the template rule, which makes a template that allows a user agent to construct a styled Result node from a Source node. The template has two parts:

The matching part identifies the source (XML) node to which the processing action is to be applied. The matching information is contained in the match attribute.

The processing part defines how the children are to be processed and what styling is to be applied to them. The processing information is contained in the template's child elements.

tokenized attribute type
In a tokenized type, the parser will normalize all white space to a single space character and will eliminate leading and trailing white space altogether. It will also validate the contents based on the declared type.

Seven attribute types are characterized as tokenized types because each value represents either a single token (ID, IDREF, ENTITY, NMTOKEN) or a list of tokens (IDREFS, ENTITIES, and NMTOKENS).

U

Uniform Resource Identifier (URI)
A superclass that includes both URNs and URLs. Presently, URI means URL in nearly all cases when discussing XML, although it is expected that URNs will become more numerous in the future. The URI supplies a universally unique number or name that can identify an element or attribute in a universally unique way.

URIs are a slightly more general scheme for locating resources on the Internet that focuses a more on the resource and less on the location. In theory, a URI could find the closest copy of a mirrored document or locate a document moved from one site to another.

Uniform Resource Locator (URL)
The set of URI schemes that have explicit instructions on how to access the resource on the Internet.

URLs are uniform in that they have the same basic syntax no matter what specific type of resource (Web page, newsgroup) is being addressed or what mechanism is described to fetch it.

Uniform Resource Name (URN)
Identifies a persistent Internet resource. A URN can provide a mechanism for locating and retrieving a schema file that defines a particular namespace. While an ordinary URL could provide similar functionality, a URN is more robust and easier to manage for this purpose because a URN can refer to more than one URL.

Unlike URLs, URNs are not location-dependent.

unparsed entity
Any block of non-XML data, sometimes referred to as a binary entity because its content is often a binary file (such as an image) that is not directly interpreted by the XML parser. An unparsed entity could contain plain text, so the term binary is a bit misleading.

Unlike a parsed entity, an unparsed entity requires a notation, which identifies the format or type of resource to which the entity is declared. Beyond a requirement that an XML parser make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

URI
See Uniform Resource Identifier.

URL
See Uniform Resource Locator.

URN
See Uniform Resource Name.

V

valid XML
XML that conforms to the rules defined in the XML specification, as well as the rules defined in the DTD or schema.

The parser must understand the validity constraints of the XML specification and check the document for possible violations. If the parser finds any errors, it must report them to the XML application. The parser must also read the DTD, validate the document against it, and again report any violations to the XML application.

Because all of this parsing and checking can take time and because validation might not always be necessary, XML supports the notion of the well-formed document.

vocabulary
See XML vocabulary.

W

W3C
See Worldwide Web Consortium.

well-formed XML
XML that follows the XML tag rules listed in the W3C Recommendation for XML 1.0, but doesn't have a DTD or schema. A well-formed XML document contains one or more elements; it has a single document element, with any other elements properly nested under it; and each of the parsed entities referenced directly or indirectly within the document is well formed.

Well-formed XML documents are easy to create because they don't require the additional work of creating a DTD. Well-formed XML can save download time because the client does not need to download the DTD, and it can save processing time because the XML parser doesn't need to process the DTD.

Worldwide Web Consortium (W3C)
A standards body located at MIT that sets standards for XML, HTML, XSL, and many other Web technologies.

X

XDR
See XML-Data Reduced.

XML
See Extensible Markup Language.

XML-Data Reduced (XDR)
An early language used to create a schema, which identifies the structure and constraints of a particular XML document. XML-Data Reduced refers to the subset of the XML-Data schema specification that was made available in MSXML 3.0 and later. It carries out the same basic tasks as DTD, but with more power and flexibility. Unlike DTD, which requires its own language and syntax, XML-Data Reduced uses XML syntax for its language. Unlike XSD, which has only recently been recommended as a standard, XML-Data Reduced was implemented and made available by Microsoft well ahead of the existence of XSD as a recommended standard by the W3C XML Schema Working Group.

XML declaration
The first line of an XML file can optionally contain the "xml" processing instruction, which is known as the XML declaration. The XML declaration can contain pseudo-attributes to indicate the XML language version, the character set, and whether the document can be used as a standalone entity.

An example is the XML declaration that begins every valid XML file:
```
<?xml version="1.0" standalone="yes" ?>
```

XML document
A document object that is well formed, according to the XML recommendation, and that might (or might not) be valid. The XML document has a logical structure (composed of declarations, elements, comments, character references, and processing instructions) and a physical structure (composed of entities, starting with the root, or document entity).

XML engine
Software that supports XML functionality on the client; Internet Explorer 4.0 and later include XML engines. Its components include the XML parser, the XSL processor, and schema support.

XML information set
A description of the information available in a well-formed XML document.

XML Object Model
An API that defines a standard way in which developers can interact with the elements of the XML structured tree. The XML object model exposes properties, methods, and the actual content (data) contained in an object. It controls how users communicate with trees, and exposes all tree elements as objects, which can be accessed without any return trips to the server. The XML OM uses the W3C standard Document Object Model.

XML parser
A software module used to read XML documents and provide access to their content and structure. The XML parser generates a hierarchically structured tree, then hands off data to viewers and other applications for processing, and finally returns the results to the browser. A validating XML parser also checks the XML syntax and reports errors.

XPath
The result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations (XSLT) and XPointer. The primary purpose of XPath is to address parts of an XML document. It also provides basic facilities for manipulation of strings, numbers, and Booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath gets its name from its use of a path notation as used in URLs for navigating through the hierarchical structure of an XML document.

XML Pointer Language (XPointer)
A W3C initiative that specifies constructs for addressing the internal structures of XML documents. In particular, it provides for specific reference to elements, character strings, and other parts of XML documents, whether or not they bear an explicit ID attribute.

An XPointer consists of a series of location terms, each of which specifies a location, usually relative to the location specified by the prior location term. Each location term has a keyword (such as id, child, ancestor, and so on) and can have arguments, such as an instance number, element type, or attribute. For example, the XPointer:
```
child(2,precocious)
```
refers to the second child element whose type is precocious.

XML Query Language (XQL)
A set of extensions to XSL Patterns proposed to the W3C.

XQL is an extension to the capabilities of XSL that will provide for searching into, and data retrieval from, XML documents. It provides ways to manipulate XML in order to create new documents, to control the content of existing documents, and to manage the ordering and presentation of these documents along with XSL.

XML Schema Definition (XSD)
A language proposed by the W3C XML Schema Working Group for use in defining schemas. Schemas are useful for enforcing structure and/or constraining the types of data that can be used validly within other XML documents. XML Schema Definition refers to the fully specified and currently recommended standard for use in authoring XML schemas. Because the XSD specification was only recently finalized, support for it was only made available with the release of MSXML 4.0. It carries out the same basic tasks as DTD, but with more power and flexibility. Unlike DTD, which requires its own language and syntax, XML Schema Definition uses XML syntax for its language. XSD closely resembles and extends the capabilities of XDR. Unlike XDR, which was implemented and made available by Microsoft in MSXML 2.0 and later releases, the W3C now recommends the use of XSD as a standard for defining XML schemas.

See also schema.

XML vocabulary
A set of actual elements and the structure for a specific document type used in particular data formats. Vocabularies, along with the structural relationships between the elements, are defined in a DTD that serves as the rulebook for that vocabulary.

One of the first and probably most well-know vocabularies is the Channel Definition Format used to define Web pages that are designed to be sent automatically, or "pushed" to client users.

XPointer
See XML Pointer Language.

XQL
See XML Query Language.

XSD
See XML Schema Definition.

XSL
See Extensible Stylesheet Language.

XSL formatting objects
A set of formatting semantics expressed as an XML vocabulary.

Conceptually, these objects form a tree. The formatting objects denote typographic elements such as page, paragraph, rule, and so forth. Finer control over the presentation of these elements is provided by a set of formatting properties, such as indents; word- and letter-spacing; and widow, orphan, and hyphenation control. The formatting objects and formatting properties provide the vocabulary for expressing presentation intent.

XSL Patterns
A declarative, non-procedural selection language implemented in MSXML versions 3.0 and earlier. For MSXML 4.0 and later, XSL Patterns is not supported. For more information about XSL Patterns, download the MSXML 2.5 SDK from MSDN® at msdn.microsoft.com/downloads/.

XSL Transformations (XSLT)
Makes use of the expression language defined by XPath for selecting elements for conditional processing and for generating text.

XSLT provides two "hooks" for extending the language, one hook for extending the set of instruction elements used in templates and one hook for extending the set of functions used in XPath expressions. These hooks are both based on XML namespaces.

Share via

XML Glossary

A

C

D

E

F

I

M

N

P

Q

R

S

T

U

V

W

X

Additional resources