An Introduction to Topic Maps
Kal Ahmed and Graham Moore
Summary: This article introduces the ISO international standard Topic Maps. The topic maps paradigm describes a way in which complex relationships between abstract concepts and real-world resources can be described and interchanged using a standard XML syntax. (15 printed pages)
This article introduces the ISO international standard Topic Maps. The topic maps paradigm describes a way in which complex relationships between abstract concepts and real-world resources can be described and interchanged using a standard XML syntax.
Topic Map History
Topic maps were originally developed in the late 1990's as a way to represent back-of-the-book index structures so that multiple indexes from different sources could be merged. However, the developers quickly realized that with a little additional generalization, they could create a meta-model with potentially far wider application. The result of that work was published in 1999 as ISO/IEC 13250-Topic Navigation Maps.
In addition to describing the basic model of topic maps and the requirements for a topic map processor, the first edition of ISO 13250 included an interchange syntax based on SGML and the hypermedia linking language known as HyTime. The second edition, published in 2002 , added an interchange syntax based on XML and XLink. This is the syntax with the widest support in topic map processing products to date, and is the syntax that we will describe in this article.
Today there are a number of implementations of the standard, both open-source and proprietary, for a number of languages and platforms including the .NET platform.
Topic Map Fundamentals
The core of topic maps can be summarized very succinctly: a topic map consists of a collection of topics, each of which represents some concept. Topics are related to each other by associations, which are typed n-ary combinations of topics. A topic may also be related to any number of resources by its occurrences.
Figure 1 shows the three fundamentals of topic maps. It also shows how the distinction between topic-to-topic and topic-to-resource relationships enables a partitioning of the model into a topic space that contains only topics and associations between topics and a resource space that contains the resources related to topics. This partitioning is interesting because it allows a topic map developed for one set of resources to be repurposed to index a different set of resources. In this way the topic map can be considered to be a portable form of knowledge.
Unlike domain-specific models, the topic map model has no predefined set of types. Instead, individual topic map authors or groups of authors in a community of practice can define the model for their domain of interest and share those models with other authors from other domains.
Figure 1. Topics, associations, and occurrences
We believe that for many end-users, a good topic maps application will conceal much, if not all, of the topic maps mechanism, allowing users to instead concentrate on the domain model(s) that they work with. However, the topic maps model and the Topic Maps standard do provide a number of benefits that can be surfaced in applications and can be unique selling points.
Simple Organizational Metaphor
The core topic maps metaphor of topics, occurrences, and associations strikes a balance between being compact and easy to understand and providing enough basic infrastructure to allow users to translate their mental model of a domain into a topic map model. Other forms of data and information organization such as RDF and the relational model may have a simpler model still, but then require the user to create infrastructure for common procedures such as labeling an item with some names; defining a class structure or creating n-ary relationships between items.
As already described above, the topic maps model has a clear distinction between the domain model, expressed as topics and associations between topics, and the indexed resources, expressed as occurrences that link topics to resources. Three major benefits can be derived from this structure:
- The topic map can act as a high-level overview of the domain knowledge contained in a set of resources. In this way the topic map can serve not only as a guide to locating resources for the expert, but also as a way for experts to model their knowledge in a structured way. This allows non-experts to grasp the basic concepts and their relationships before diving down into the resources that provide more detail.
- A topic map can be easily partitioned depending on the resources to be made available. Some publishers use a topic map-based index of large sets of resources, and then dynamically create the appropriate index when they publish a subset of those resources. With some thoughtful modeling it is even possible to create different layers of detail in a topic map, and to differentiate between information products based on the indexing and navigation features that they provide as well as the informational content of the products.
- Topic maps that index different resource sets can be easily combined. This feature can be used to allow organizations to "import" third-party data and indexes and seamlessly integrate their own data and indexes.
No Fixed Ontology
As already noted, the Topic Maps standard does not come with a predefined ontology1. There is no restriction on the domains to which topic maps can be applied and relatively few constraints even on the modeling approach taken. We have seen topic maps used to model temporal relationships between events; relationships between abstract concepts and their depictions; and forms of firstorder logic, as well as more traditional relationships such as thesauruses; controlled vocabularies, and business information.
XML Interchange Syntax
For many users, the fact that topic maps can be interchanged using a standard XML-based syntax provides a strong benefit in improving the portability of their data between applications and platforms. In addition, the XML interchange syntax allows easy integration of topic map information exchange within the Web services architecture.
There are three principal benefits that system architects and developers can gain from the application of the topic map paradigm, and they can be summed up as "Flexibility, Flexibility, and Flexibility."
Topic maps provide the meta-model on which a completely flexible application model can be built. Creating new types of business objects can be achieved by adding data to the ontology that constructs the topic map. Because the ontology is itself expressed as topics and associations between topics, extension of the ontology becomes an issue of adding data, not an issue of redesigning the underlying storage schema. This makes it possible to modify the data model used by an application without the need to upgrade deployed persistent stores.
Flexible Application Structure
With the application data stored in a standardized and extensible meta-model, the path is open to enable much simpler third-party application integration and extension. This model would allow a third-party developer to define his or her own data and code extensions to an application without relying on the core application's schema supporting the extension-specific data structures.
In addition to allowing third-party extensions to the application schema, the flexibility of the topic map structure can be used to allow users to create his or her own extensions. This has two effects:
- It enables applications to be highly customizable: allowing much easier integration of customer-specific data systems, for example. We see this as being the key to vertical applications that can be more easily deployed for multiple customers. For example, one successful topic map application was produced by a publisher of legal information for the financial services market. The unique selling point of their topic-maps based product is that they can integrate their customers' marketing and procedures documentation with the legal information they provide.
- It enables development of horizontal applications that can be integrated more easily into existing environments.
Topic Maps in Detail
A topic is a machine-processable representation of a concept. The Topic Maps standard does not restrict the set of concepts that can be represented as topics in any way. Typically topics are used to represent electronic resources (such as documents, Web pages, Web services) and non-electronic resources (such as people or places). Topics can equally be used to represent things that have no tangible form at all, such as companies, events, and abstract concepts like "Pensions" or "Insurance."
Figure 2. Subject locator and subject identifier
Topics have four principal forms of identity. A topic can have zero or more of each of these forms of identity, and thus can be identified within a topic map system by a number of different ways:
- Identity as a topic resource in a serialized topic map. When a topic map is represented in a serialized form for interchange, each topic is assigned a URI identifier that is unique across that topic map. These URIs are used principally for deserializing references between topics. Such identifiers are referred to as source locators.
- Identity as a human-readable label. A topic can have any number of topic names. Names act as labels for human consumption and can be either text or a reference to some non-textual representation (for example, an icon, a sound clip, an animation clip, and so on.). The scope mechanism (described later) allows for the case of homonyms (where a single word is used to refer to two or more different concepts).
- Identity by reference. When a topic is used to represent a resource that already has its own unique URI, that URI can be used as part of the identity of the topic. This is simply a way of telling the processing agent that "This topic stands for that resource." In the topic map standard, this form of identifier is known as a subject locator.
- Identity by description. Topics can be used to represent a concept that does not have its own unique URI. Many of the things that a topic can represent could never have a unique URI because they are not things that a computer can resolve a reference to. For example, a person may have any number of database records about himself or online biographies or pictures, but none of those addressable resources are the person—they are merely some form of descriptor for the person. In the topic map standard, this form of identifier is known as a subject identifier, and the resource that the subject identifier resolves to is known as a subject indicator. Topic maps allow the use of URI references to such descriptive resources as a form of identity. Obviously it is important that the topic map author chooses unambiguous descriptive resources for this purpose, and this is an issue that we will return to later.
The distinction between the latter two of these forms of identity can be confusing. Consider the URL https://www.networkedplanet.com/about/index.html. This is a Web page that describes the company NetworkedPlanet. So, this URL could be used as the subject identifier for a topic named "The company NetworkedPlanet," because it resolves to a resource which describes the concept of the company. However, if we wanted to talk about the concept "The 'About' page on the website www.networkedplanet.com," we actually want a topic whose subject really is the resource at the address https://www.networkedplanet.com/about/index.html and so we would then use the same URI as a subject locator.
The key difference between a subject identifier and a subject locator is that a subject identifier requires human interpretation of a resource to determine the concept that a topic represents, whereas a subject locator simply points to the concept that the topic represents. This is shown in Figure 2. The wide solid arrow shows the use of a resource as a subject locator. The thin solid arrow shows the use of the same resource as a subject identifier. The thin dashed arrows show the role of the human being in the interpretation of a subject identifier.
Although a single topic can have many forms of identity, it is important to note that each separate identifier can resolve to only one topic. The merging rules of topic maps (described later) enforces this one-to-many relationship between topics and their identifiers.
In addition to these forms of identity, a topic can also have any number of types and any number of names.
The types of a topic define the class (or classes) of concept that the concept represented by the topic belongs to. Types are treated in topic maps as concepts in their own right; hence every type is represented by a topic. The type of a topic is specified simply by a privileged form of relationship between the topic that represents the instance and the topic that represents the type.
The names of a topic define a set of labels for a topic. Every name has a hierarchical structure. At the root is the base name, which has a string representation. It is the base name string value that is used to determine topic identity by label. A base name is also a container for any number of alternate forms (known as variant names). The alternate forms of a name may be either string values or references to resources; allowing representations such as icons or sound clips to be referenced as variant names. Base names and variant names can be given a context (or scope) in which they are valid, allowing a topic map-aware application to select the best name for presentation to a user in a given situation. We will cover scope later.
Associations are the general form for the representation of relationships between topics in a topic map. An association can be thought of as an n-ary aggregate of topics. That is, an association is a grouping of topics with no implied direction or order, and there is no restriction on the number of topics that can be grouped together.
An association can be assigned a type (again defined by a topic) that specifies the nature of the relationship represented by the association. In addition, each topic that participates in the association plays a typed role that specifies the way in which the topic participates.
For example to describe the relationship between a person, "John Smith," and the company he works for, "ABC Limited," we would create an association typed by the topic "Employment" and with role types "Employee" (for the role played by "John Smith") and "Employer" (for the role played by "ABC Limited").
Like names, an association can be assigned a scope in which it is valid, and which may be used by a topic map-aware application to determine whether or not to display the information represented by the association to a user in a given situation.
Occurrences are used to represent or refer to information about a concept represented by a topic. Occurrences can be used either to store string data within the topic map, or to reference any kind of Web-addressable resource external to the topic map. No restriction is placed on what type of resource is addressed by an occurrence. It may be a static HTML page, an HTML page generated by ASP, a Web service or any other type of resource. Neither are occurrences restricted to the HTTP protocol—any address encoded as a URI can be used to address an external resource. Once again, occurrences can be typed, using a topic to express the occurrence type, and a scope of validity can also be assigned to an occurrence.
Scope is the term used in the topic map standard to refer to a constraint or a context in which something is said about a topic. The way in which such statements about topics are made is by adding a name to the topic; specifying an occurrence for a topic; or creating an association between topics (in which case the statement applies to all of the topics in the association).
In many cases statements are not always true, but are dependent upon a context. For example we make statements such as "ABC Limited was top vendor of widgets in Q2 2004," or "Fred says that ABC Limited is a good investment." In these statements the context is shown in italics—a temporal context in the first case and a quotation context in the second case. More prosaically, context is often used to facilitate multi-lingual interfaces, so the concept "Dog" may have the label "dog" in the context of the English language, "le chien" in French, and "das Hund" in German.
In a topic map, scope is defined by a collection of topics that can be assigned to a name, an occurrence, or an association. The default scope (where no set is assigned) is known as the unconstrained scope and simply means that the name, occurrence, or association is always valid.
When a topic map-aware application encounters a name, occurrence, or association that has a scope assigned to it, the application should make use of information it has about the current operating context and compare that information against the scope information contained in the topic map to determine if the construct is valid and whether or not it should be presented to the user.
In the current edition of ISO 13250, the mechanics for processing scope against an application context are not constrained by the standard, and for many topic map developers this is seen as a shortcoming as it can make it more difficult to exchange topic maps that use scope. The next revision to the standard will recommend that a scope that consists of multiple topics should be processed such that the scoped construct is valid only if the application determines that all of the topics in the scope apply to the current application context.
Figure 3. The Structure of an Association
Automatic topic merging is a key feature of topic maps and one that brings many benefits to topic map development and to applications that make use of topic maps for managing and exchanging data.
The principle behind topic merging is that in any given topic map, each subject described by the topic map must be represented by one and only one topic in the topic map. This means that it is the responsibility of the topic map processor to attempt to identify the situation in which two topics represent the same subject and to process them so that only one topic remains. This is the process of merging.
Identifying when two topics represent the same subject is achieved by applying heuristics. The topic maps standard defines a set of basic heuristics:
- If two topics share the same source locator, then they have been parsed from the same topic map source and must be considered to represent the same concept.
- If two topics have the same subject locator, then they both identify the same network resource as being the thing that they represent.
- If two topics have the same subject indicator, then they are both using the same resource to describe the concept that they represent and must be considered to represent the same concept.
- If two topics each have a base name with the same string representation and the scope of the base names are the same set of topics, then the topics must be considered to represent the same concept.
- Finally, a topic map application may make use of any domain-specific information it has to determine that two topics represent the same concept.
Item (3) in the list above raises the importance of selecting a good resource as the description for a concept. If the description is somehow ambiguous or if the resource addressed is not well-defined enough, it is possible that two different topic map authors might use the same resource as a descriptor for different concepts, leading to undesired merging. In our experience, good resources for subject descriptors are ones created specifically to describe a single subject—the pages at wikipedia.org, for example, or pages created by the topic map author(s) or by a community of practitioners to define a controlled vocabulary.
Figure 4.Topic Merging
Item (4) has proven to be controversial in the topic map community as it relies on what many consider to be a relatively weak form of identity—the name for a concept in some language. The mapping of words in a language to concepts is a complex affair and one has challenges in multiple words having different meanings (homonyms), not to mention localization challenges! In the next version of the ISO standard, the restrictions on name-based identity will be tightened still further to require an author to explicitly flag a topic name as being one that should be used to confer an identity (the default being that a name shall not confer identity to its topic).
Item (5) allows for applications to extend the Topic Maps standard's set of merging criteria with application-specific criteria. These could include criteria based on more than a straight-forward string or URI comparison. For example an application might know that "The Duke" and "John Wayne" are names for the same actor and merge two topics on that basis. Having identified the topics to be merged, the merging process defines the process of replacing those two (or more) topics with a single topic. The single topic that results from the merging process has all of the identifiers, names (including variant names), and occurrences of the topics that are merged. In addition, the result topic replaces the merged topics wherever they are referenced (that is, in any associations, scopes, or types that they appear in). This process is shown schematically in Figure 4.
The merging to two (or more) topic maps is simply the process of combining their sets of topics and associations and then applying the topic merging rules to the result.
Interchange and the XTM Syntax
As already noted above, the ISO Topic Maps standard defines two standard interchange syntaxes, one SGML-based and the other XML-based. The XML syntax defines a topicMap element which contains any number of topic and association elements. A simple example of a topic map in XTM syntax is shown below.
<topicMap xmlns=https://www.topicmaps.org/xtm/1.0/ xmlns:xlink="https://www.w3.org/1999/xlink"> <topic id="band"> <baseName> <baseNameString>Band</baseNameString> </baseName> </topic> <topic id="person"> <baseName> <baseNameString>Person</baseNameString> </baseName> </topic> <!-- Similarly for membership, group, singer and guitarist --> <!-- The Clash is a Band --> <topic id="clash"> <instanceOf> <topicRef xlink:href="#band"/> </instanceOf> <baseName> <baseNameString>The Clash</baseNameString> </baseName> </topic> <!-- Joe Strummer is a Person (note multiple names) --> <topic id="joe-strummer"> <instanceOf> <topicRef xlink:href="#person"/> </instanceOf> <baseName> <scope> <topicRef xlink:href="stage-name"/> </scope> <baseNameString>Joe Strummer</baseNameString> </baseName> <baseName> <baseNameString>Joseph Mellor</baseNameString> </baseName> </topic> <!- Joe Strummer is a member of The Clash --> Note separate member elements used for the different roles played --> <association> <instanceOf> <topicRef xlink:href="#membership"/> </instanceOf> <member> <roleSpec> <topicRef xlink:href="#group"/> </roleSpec> <topicRef xlink:href="#clash"/> </member> <member> <roleSpec> <topicRef xlink:href="#singer"/> </roleSpec> <topicRef xlink:href="#joe-strummer"/> </member> <member> <roleSpec> <topicRef xlink:href="#guitarist"/> </roleSpec> <topicRef xlink:href="#joe-strummer"/> </member> </association> </topicMap>
We will not go into the details of the syntax here. The interested reader is referred to the original XML Topic Maps specification  produced by TopicMaps.org (which was subsequently adopted by ISO).
It should be noted that the XTM syntax does not impose the merging restrictions that are required of a topic map processor. This allows XTM to be created easily, but requires that any processor that reads an XTM file must detect topics that must be merged and apply merging rules as the XTM file is parsed. When an XTM file is known to be "fully merged" (that is, it does not contain topic elements representing topics that should be merged), the topic map model that it contains can be easily accessed using standard XML processing tools such as XSLT and XQuery. However, it is not the case that standard XML processing tools can be easily applied to XTM files where merging is required.
Despite the issues with merging, the XTM syntax serves the basic need of allowing interchange between conformant topic map processing applications. In addition, the syntax and merging rules together are sufficiently flexible to even allow parts of a topic map to be serialized as separate XTM documents and later recombined through merging .
Topic Map Patterns
As we have hopefully demonstrated up to this point, the Topic Maps standard provides a very flexible base architecture for a wide variety of information and knowledge management applications. This flexibility can lead to confusion and constant reinvention of basic modeling approaches. To address this issue, we advocate the development and use of patterns within topic map applications. We divide patterns into two broad categories: Topic Map Design Patterns that are patterns for modeling topic map data; and Topic Map Application Patterns that are architectural patterns for the use of topic map processing systems.
Topic Map Design Patterns
The basic concept of a Topic Map Design Pattern borrows heavily from design patterns in software engineering. A Topic Map Design Pattern provides a focused and reusable ontology that addresses a single issue. There are a couple of interesting differences, however.
- A Topic Map Design Pattern can be more prescriptive than a software design pattern, as it should specify the subject identifier URIs for the key topics used by the pattern. In this way, every implementation of a particular pattern in a topic map can be instantly recognized by the presence of topics with the URIs specified by the pattern.
- As a topic map is purely data, behaviors related to a Topic Map Design Pattern are implemented not in the topic map itself but in the processing software that makes use of the topic map data. Some design patterns may prescribe a particular set of behaviors for processing applications; others may describe only the data model and leave the way in which the application processes the data model open.
Some basic patterns taken from Library Science have been defined by one of the authors and are supported by a number of topic map processing applications . These patterns include patterns for hierarchical and facetted classification. An example of one such pattern is shown here.
The Hierarchical Classification Pattern makes use of a very useful modeling property of topic maps. That is that every topic, association, and occurrence type is itself a topic. This feature allows the ontology of a topic map application to be annotated using the same structure as is used to populate the ontology itself, and can be used to great effect in design patterns, by allowing an existing topic map ontology to be annotated using the pattern "metaontology" rather than modified.
This pattern enables an application to process a set of associations between topics as representing a hierarchy. For example, it may display the topics arranged into a tree view.
Topic Map Application Patterns
Topic Map Application Patterns provide high-level architectural patterns and principally concentrate on the integration of a topic map processing system with other data systems and applications. These patterns include patterns for representing information from external data systems as topic map data; patterns for the import of information from external data systems; and patterns for the export and display of topic map data.
We will discuss topic map applications in more detail in a following paper.
At the time of writing more work is being done within ISO both on the Topic Maps standard itself and on a suite of companion standards.
Although ISO/IEC 13250 has been through a revision, the core of the standard has remained unchanged since 1999—a fair degree of stability in comparison to many Internet standards. However, the ISO committee has decided that the next version of the standard will be a significant overhaul in the way the standard is presented and a minor overhaul of the standard itself.
Figure 5. Hierarchical Classification Pattern
The ISO/IEC 13250 standard is to be divided into a number of separate parts: a non-normative introduction; a formal description of the underlying data model of topic maps; an XML/XLink-based interchange syntax with a description of the process of deserializing the syntax into an instance of the data model and serializing the data model into a document conforming to the interchange syntax; and a canonicalization algorithm for the data model that can be used in topic map processor conformance testing. It is hoped that this organization will make the standard more reader friendly and will add features that were originally missing and were felt to be important for future developments (specifically the formal model specification and the canonicalization algorithm).
Changes to the standard include the ability to apply data-types to occurrence values, including the ability to embed XML; the ability to declare a subset of the names of a topic as names to be used for determining topic identity; a clearer model of scope; and a definition of the interchange syntax in W3C XML Schema and Relax-NG, as well as XML DTD.
In addition to the changes to ISO/IEC 13250, the committee has also commenced work on two companion standards. ISO/IEC 18408: Topic Maps Query Language (TMQL) will define a language for querying the topic map data model, allowing the selection of both topic map constructs (such as topics and associations) and of the data carried by them (topic name or occurrence values, for example).
ISO/IEC 19756: Topic Maps Constraint Language (TMCL) defines a schema language for topic maps that would allow the schema author to constrain the constructs that can appear in a topic map and how they must relate to one another. As with XML a schema language for topic maps enables both validation and also smarter, schema-driven editing applications.
Both of these standards are currently in an early stage of development with requirements defined and, in the case of TMQL, an initial proposal for the language has been created.
Work on the core standard and on the query and constraint languages can be followed on the ISO Topic Maps website .
This article introduced the topic maps paradigm in the context of the ISO standard. We presented the principal components of the topic map model, showing how the standard processing components of scope and topic merging give additional power to this model.
In a forthcoming article we will present some concrete use-cases for topic maps and show how the topic map model can be used to address many of the information organization and interchange needs of a modern business environment.
- The word "ontology" in this context means the system of types of topics, occurrences, and associations that together define the classes of things and relationships between things that are documented by a topic map.
- Biezunski M., Newcomb S., Pepper S. (ed.). ISO/IEC 13250:2002, Topic Maps [online]. ISO. PDF format.
- Moore G., Pepper S. (ed.), XML Topic Maps (XTM) 1.0 [online], TopicMaps.Org. HTML format.
- Ahmed K., TMShare—Topic Map Fragment Exchange in a Peer-To-Peer Application. HTML format.
- Ahmed K., Topic Map Design Patterns for Information Architecture. HTML format.
About the authors
Kal has worked in SGML and XML information management for 10 years working in both software development and consultancy. He is well-known in the topic map community for his work on the open source Java topic map toolkit, TM4J and for his contributions to development of the ISO standard. Kal has published many articles on topic maps and topic map-related themes and is a frequent conference speaker.
Kal is now co-founder of Networked Planet Limited, a company developing topic map tools and topic maps-based applications for the .NET platform.
Graham has worked for eight years in the areas of information, content and knowledge management as a developer, researcher and consultant. He has held leading roles as CTO of STEP, Vice President Research & Development empolis GmbH and Chief Scientist Ontopia AS. He has been responsible for the development of knowledge management products including K42 Topic Map Engine, X2X Link Management Engine and e:kms knowledge suite. Graham is co-editor of the XTM 1.0 XML Topic Maps standard and ISO13250-1 and -2 (Topic Map Data Model and Syntax), he is also co-editor of TMCL (Topic Map Constraint Language).
Graham is currently co-founder of Networked Planet Limited.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.