Domain-Specific Modeling
by Steve Cook
Summary: Domain-specific languages (DSLs) are special-purpose languages designed to solve a particular range of problems. DSLs are nothing new. Common examples are HTML, designed for representing the layout of Web pages; SQL, designed for querying and updating databases; and regular expressions, designed for finding and extracting particular patterns in textual strings. The essence of a DSL is that it makes a large problem smaller. Without HTML, the problem of rendering Web pages with more or less equivalent appearance on millions of screens connected to different kinds of PCs would be insurmountable. Without SQL, the problem of allowing multiple concurrent users to establish, query, and combine large lists of data would be a massive programming task. Without regular expressions, searching for substrings within text would involve writing a complicated program. There is a pattern here: to turn large problems into small problems, identify a language that efficiently expresses that class of problems, and apply the pattern to connect expressions in the language into the final environment in which the problem is to be solved. Given that we are confronted daily with large problems to solve, let's look at how we can harness this idea in practice.
Contents
Kinds of DSL
Integrating the DSL
Validation and Error Handling
Forward and Reverse Generation
Using DSL Tools to Build a DSL
Designing a DSL
DSLs and Software Factories
About the Author
Resources
The DSL pattern has three primary components (see Figure 1). Firstly, there is the model that the DSL user creates to represent the problem. This model might be a textual expression, as in the cases cited earlier in the article summary, or it might be an annotated diagram. Secondly, there is a platform that will be used to execute the solution to the problem at hand. In the case of HTML, this platform will be a Web browser; in the case of SQL, a database; and in the case of a regular expression, a text editor or programming environment. Thirdly, there is a method to integrate the language expression into the platform to configure it to the problem at hand.
Figure 1. The DSL pattern
There are two primary means of integration: interpretation and generation. With interpretation, part of the platform itself is dedicated to recognizing expressions in the DSL and executing their intent. With generation, a separate procedure is used to convert the DSL expression into something that the platform recognizes natively. You can also see the use of hand-crafted techniques (see Figure 1). A particular model will inevitably only represent one aspect of the problem to be solved, and other techniques must be used to solve the rest of it.
In recent decades model-driven techniques have been proposed widely as a means to increase the efficiency of software development. Under names such as structured, object-oriented, or analysis and design, the idea is to draw a diagram that represents an aspect of the system under construction, and to use that diagram directly to help generate or implement that system. Such was the vision behind Computer Aided Software Engineering (CASE) tools, promoted by many vendors during the 1980s, and more recently model-driven architecture, promoted by the Object Management Group.
On closer inspection, we can see that model-driven development is exactly a case of the DSL pattern. The model is an expression in the domain-specific language, this time a modeling language; the platform is the execution platform for the system under construction; and the integration step is a code generator that transforms the model into code that executes on the platform.
Although it is definitely an application of the DSL pattern, CASE was not a great success. If we try to analyze why, we can identify two primary reasons. Firstly, the models were not a particularly pleasant or convenient expression of the problem at hand. Working programmers would not necessarily recognize that the alternative expression of the problem manifested in the diagrammatic models was any better than simply writing the code in a general-purpose language, and therefore would resist the introduction of these techniques. Secondly, a lot of code was generated to bridge the abstraction gap between the models and the execution platform. Any mistakes or inefficiencies in this generation step would tend to be corrected not by fixing the generator but by fixing the generated code, thereby breaking the link between the model and the solution and rendering the pattern inoperative.
Figure 2. The customization pit
We may derive some important conclusions from this analysis. Firstly, it's important for the meaning of the models to be readily apparent to people familiar with the domain. The language must be designed carefully to fit the intended purpose. We will return to this topic later. Secondly, to successfully deploy such approaches it is very important to win the hearts and minds of working developers, so they can see that the pattern or tool will help them to get their work done and will adopt it. Finally, the generation process must be efficient, and it must be straightforward to remedy errors in it and to customize it.
An increasingly important motivation for considering the use of DSLs is the sheer diversity and interconnectedness of today's systems. Like it or not, any system of significant size involves a combination of many different kinds of technologies and representations: programming languages, scripting languages, data definition and representation languages, control and configuration languages, and so on. Given a particular feature in the requirements of the system to be built and deployed, it is quite unavoidable for different aspects of that feature to be scattered across all of these different technologies and representations. This problem cannot possibly be solved by individual improvements in any of these different technologies; it must be addressed in a holistic way, by finding a level of representation that spans all of the implementation components and technologies. The DSL pattern provides a means to do this.
The benefits of DSLs can be considerable. DSLs can enable much better communication with stakeholders than lower-level technologies. Changes in requirements can be represented by changes in the model, and thereby implemented rapidly. Changes in the technological platform can be incorporated by manipulating the integration step of the pattern, leaving the modeled representation unchanged. The volume of code to maintain is smaller, and bugs in the generated code can be fixed by fixing the code generator.
These benefits do not come for nothing. To achieve them requires tooling up for the pattern. Implementing a DSL from scratch, whether textual or graphical, is a major enterprise and not to be undertaken lightly. To alleviate these costs is the objective of an emerging category of tools called language workbenches. A language workbench is a set of tools that are targeted specifically at the creation and deployment of new DSLs into existing development environments. Language development is itself a domain that is highly amenable to the application of the DSL pattern (and, so, a crucial aspect of a language workbench is that it is bootstrapped) that is built using itself. An example language workbench is the DSL Tools, part of the Visual Studio SDK, which enables the rapid development of graphical DSLs integrated into Visual Studio 2005. We'll take a more detailed look at the DSL Tools later.
Kinds of DSL
A domain is a subject area or area of concern to a particular set of stakeholders in the system. Domains might be horizontal, technical domains, such as user-interface, data persistence, communication, or authentication. Or they might be vertical, business domains, such as insurance, telephony, or retail. Domains can overlap. Domains can be parts of other domains. A domain is bounded by the concerns of its stakeholders, and as these concerns evolve, so does the domain. Hence, domains are dynamic. When stakeholders see expressions or models in the language, they must immediately recognize them as directly expressive, useful, relevant, and empowering.
Figure 3. The customization staircase
Languages may be textual, diagrammatic, or a combination. Modern high-performance personal computers with bitmapped displays are well equipped to implement diagrammatic languages, which are often much more expressive in relatively nontechnical domains than textual languages. As they say, "a picture is worth a thousand words."
Whether textual, diagrammatic, or a combination, a DSL must be implemented to make it useful. Implementing a DSL means building a tool that allows users to edit expressions or models in the language. Such a tool would not normally stand alone. Because a DSL typically addresses only a portion of the entire problem at hand, the DSL tool must be tightly integrated into the development environment.
Integrating the DSL
Figure 1 shows that the language must be integrated into the platform. One aspect of this integration is that the expressions in the language—models—must be converted into a form that is executable by the platform. This aspect is straightforward if the platform is designed to directly interpret the models. More commonly, however, it is necessary to transform the models into a form that can be interpreted. Typically, this transformation involves the generation of code that can be compiled and linked into an executable that runs against the platform.
An important advantage of interpreting models directly is that no compilation step is required, which makes it straightforward to deploy new models and to remove old ones, even in the context of a running system. Code generation on the other hand has advantages, especially early in the evolution of a DSL:
- It is simple to implement.
- Existing mechanisms for compiling, linking, and debugging the code can be used.
- It is straightforward to customize the generated code, and thus extend the scope of the DSL.
Figure 4. A starting solution for component models (Click on the picture for a larger image)
It is important that a DSL should be customizable. Figure 2 illustrates what can happen with a noncustomizable DSL. The scope of solutions that can be addressed by using the DSL forms in an area called the customization pit. Although it is simple to use the language to solve problems in this area, as soon as it is necessary to step outside of this area, users encounter an insurmountable cliff because they would have to modify the platform itself, which is not often feasible. Modification might be acceptable with a mature DSL in a mature domain, but in other cases it can be a major obstacle to success.
With a code-generation approach, it is straightforward and highly desirable to turn this cliff into a staircase, which is eased greatly by the abstraction facilities offered by modern general-purpose programming languages, especially static type checking, inheritance, virtual functions, and partial classes (see Figure 3). The latter are a particular useful feature of the C# language that allows the definition of a class to be spread across multiple files, which are compiled and linked together into a single class definition. This definition makes it simple to generate part of a class and write the rest by hand, and if subsequent regeneration is needed, then the handwritten part is preserved without any difficulty.
Note that such facilities were not widely available in mainstream languages during the 1980s, which meant that it was much harder to engineer this kind of customizability into CASE tools.
The first step of the staircase can be enabled by inserting explicit customization hooks into the DSL itself. When one of these hooks is enabled, then instead of generating complete code a stub is generated that requires the user to handwrite some code that completes the program. If they do this incorrectly, the compiler's error messages will tell them how to correct what they did.
The second step of the staircase can be enabled by generating code in a "double-derived" pattern. Instead of a single class, a pattern of two classes is generated. The base class contains all of the generated method definitions as virtual functions; the derived class contains no method definitions but is the one that is instantiated, which allows the user, in a partial class, to override any of the generated functions with their own version. Of course the use of virtual functions incurs a run-time penalty, but usually the benefit of customizability outweighs this cost.
The third step of the staircase is enabled by making the code generators themselves available for substitution. This availability might be used when retargeting the language onto a new platform, or adding major feature areas, or fixing bugs in the generators.
The final step of the staircase is to modify the platform itself, which as already noted is not often a feasible option.
Referring back to the DSL pattern introduced in Figure 1, there may be certain kinds of platform configuration that do not require the full power of a DSL. For example, there may be just a few configuration options that can be input using a simple form or wizard. Or perhaps the configuration of the solution requires the selection of features from a list, much as the installation of a software package often involves the selection of which features the user wants to have. In such simple cases the full power of a DSL is unnecessary. DSLs come into their own to configure aspects of the solution that involve significant complexity and validation in their own right, in which case the features described in this article can be used to create the DSLs cheaply and effectively.
Validation and Error Handling
A key advantage of the DSL pattern is that the model can be validated before integrating it into the platform. Constraints on how design elements may be connected and named can be enforced at the level of the model, which can catch many kinds of errors much earlier than would otherwise be the case. For example, architectural design rules such as layering, avoiding circularity of dependencies, consistency of user interaction, or ensuring that the design matches the limitations of the implementation can be enforced.
Validation can be either hard or soft. Hard validation means that the user, creating a model, is simply unable to create an invalid model by the way that the modeling tool responds to interactions. Soft validation is run in a batch, often when the model is saved or prior to code generation, or on explicit request, and will report errors to the user and refer them to the source of the error. Soft validations are typically cheaper to implement, but hard validations can add significantly to the productivity of the user experience.
Forward and Reverse Generation
It is frequently proposed that a model should be automatically generated from a solution, that is, that code generation should also work in reverse. This proposal is often called "reverse engineering" and is a feature claimed by several CASE tools.
In reality, you can only extract a model from code if the model is little more than a diagram of the code. That extraction can be useful (it's what sequence diagrams, for example, were invented for), but it is not the approach we are addressing with DSLs. A good DSL is close to the domain model and comprehensible in the terms of the domain. The templates that transform the model to software are created by the developers in that domain, capturing their expertise in developing software in that domain. This approach is very different from the typical "round-trip" tool, where there is only one way, or a very few ways, of mapping a model to code.
Extracting just one design aspect from the code of a system cannot be done unless that aspect has been kept carefully separate, and the code is marked in such a way that the aspect can be extracted. In practice, facilities of this kind are normally intended to allow model-driven development and handwritten development to be seamlessly mixed, so that when code has been generated, developers can modify this code by hand, and the result can be transformed back into the model to keep the model synchronized with the code.
Figure 5. Testing the component DSL (Click on the picture for a larger image)
By far the easiest solution to this problem is to keep the generated and handwritten code physically separate by using facilities such as partial classes, as already explained. It can also be convenient to generate "starter" code intended to be customized by hand; in this case the code generator must avoid overwriting such code in later generation steps. If elements of the model have been changed sufficiently to cause the hand-customizations to be rendered syntactically incorrect, then this rendering will be picked up by the compiler; and since this is by far the most common case, caused by events such as changing names and structures, there is a lot of mileage in this simple approach.
Instead of using partial classes, the generated code can be marked using comments that can be read by the code generator and that delimit generated code from handwritten code. This approach separates the two areas of code effectively at the cost of including machine-readable comments that can reduce the human readability of code.
A more ambitious approach relies on the structure of the code and the model being extremely similar, either because the model is a very direct representation of the code, or because the code is structured by a very specific and constrained pattern. In such cases a model can be extracted by parsing the code, and the handwritten and generated areas distinguished by the basic structure. Even in these cases, though, there can be considerable ambiguity about what is intended at the level of the model when particular changes are made to the code. Making reverse synchronization work effectively for such cases typically incurs a much higher implementation cost for the tooling than the simpler forward-only approach.
Using DSL Tools to Build a DSL
The DSL Tools constitute a component of the Visual Studio 2005 SDK that makes it straightforward to implement a DSL, or in other words to build a domain-specific modeling tool. With the DSL Tools, the DSL author defines the concepts of the language, the shapes used to render those concepts on the diagrammatic editing surface, and the various ancillary components that are used to load and save the model to integrate the new tool into Visual Studio and to generate code and other artifacts from the models created using the tool.
To start creating a DSL, the author creates a new Visual Studio project, selecting the Visual Studio template Domain Specific Language Designer. A wizard offers a choice of starting points that provide complete working languages for authors to modify to their needs, rather than having to start from scratch. After selecting the Component Models starting point and defining a few basic parameters, the author is placed into a Visual Studio solution (see Figure 4).
The central area of the screen consists of a diagram divided into two parts, labeled Classes and Relationships and Diagram Elements. Contained in these areas are shapes that represent the constituents of the definition of a DSL. The Classes and Relationships define the concepts that the DSL represents, which in this case are Components, Ports, and Connections. The Diagram Elements define the shapes that will appear in the diagram of the resulting modeling tool.
The other parts of the screen are arranged as follows: On the left is the Toolbox, containing elements that can be dragged onto the diagram to create new classes, relationships, diagram elements, and so on. At the top right is the DSL Explorer, which offers a tree-structured view of the complete language definition. This view shares an area of the screen with other Visual Studio windows including the Solution Explorer, Class View, and Team Explorer. At the lower right is the Properties browser, which offers a detailed drill-in to the properties of the element selected currently on the diagram or the DSL Explorer. Within Visual Studio, users can arrange all of these windows to their taste, including undocking them and distributing them across multiple displays.
The solution shown in Figure 4, which you will recall was offered as a starting-point for the DSL author, defines a complete working language. Two steps are required to see it working. The first step is to generate all of the code and configuration files needed to make the tool, which is done by clicking Transform All Templates in the Solution Explorer. The second step is to press the F5 key, which builds and registers the solution and launches a second copy of Visual Studio to test the DSL. Figure 5 shows the result of opening a file called Test.comp and using the Toolbox to add a couple of components to the diagram.
At this point, authors have many options for how to modify and extend the language. They can delete unwanted parts of the language definition. They can add new domain classes and relationships or add properties to existing domain classes and relationships, to represent additional concepts. They can add new shapes and connectors to extend and alter the way that the language's concepts are displayed diagrammatically to its users. They can create code generators that transform a model built using the language into code or configuration data. They can create new validation rules to represent the domain's constraints. They can customize the language, through its extension points, to offer different kinds of user-interface options for the model builder such as forms, wizards, or text editors.
Figure 6. Swim lanes added to the DSL definition (Click on the picture for a larger image)
Let us look at how to create a simple extension to the component modeling tool (see Figure 6). We will add "swim lanes" to the tool, to represent architectural layers. These layers can be used to represent design constraints, such as restricting a component to communicate only with components in adjacent layers.
Firstly, we add the concept of a layer to the domain model. This concept is involved in two relationships: A single ComponentModel contains any number of layers, and a Component refers to the layer with which it is associated. The domain class Layer is made a subclass of NamedElement, so that it acquires a name property. Then, we drag a swim lane off the Toolbox into the diagram area, call it LayerSwimlane, and give it a NameDecorator in which to display the name of the layer. We use the DSL Details tool to declare that a ComponentShape has a layer as its parent and specify how to merge a component into the model when it is dropped onto a layer. Finally, we associate the Layer domain class with the LayerSwimlane shape using the Diagram Element Map tool. These concepts and relationships are shown in Figure 6.
At this point, the designer can be regenerated and tested. Once again, click Transform All Templates, followed by pressing the F5 key, which launches a second copy of Visual Studio. Now, the component modeling language has swim lanes, and when components are placed in a swim lane, they get associated automatically with the corresponding layer (see Figure 7).
Figure 7. Component DSL with swim lanes and validation (Click on the picture for a larger image)
At this point, the language author can define new validation constraints that will ensure, for example, that each component only communicates with a component in an adjacent layer. This approach can be done by means of validation methods defined on partial classes for the relevant domain classes—Component, in the example. Using the DSL Tools, validation methods can be defined that implement the validation logic. All of the calling and error-reporting logic is implemented by the language framework. The warning resulting from implementing such a validation can also be seen in Figure 7.
Designing a DSL
Domain-specific modeling has been applied successfully in numerous domains, including mobile telephony, automotive-embedded devices, software-defined radio, financial applications, industrial automation, workflow, Web applications, and others. Several interesting case studies from various vendors can be found at the DSM Forum Web site (see Resources).
To design a DSL, it is crucial to have the involvement of the domain experts because very often the basic inspiration for a DSL will be found on their whiteboards. They will sketch out the way that they think about the important problems in their domain, often using diagrams consisting of shapes connected by lines of various kinds. Working with the DSL developers, these ideas can be translated into domain models mapped to shape models that can be implemented using the DSL Tools.
A fundamentally important aspect of the design of a domain model is its set of validation constraints. For example, it is likely to be necessary that various names are unique within their context, so that elements can be identified uniquely. Also there might be existence constraints; for example, if part of the model represents a mapping or transformation, there must be things at either end. There might be topological constraints, such as the presence or absence of cycles. These constraints must be identified and implemented because to generate code or otherwise implement the model into its environment, it must be valid. As noted earlier, constraints can be implemented either as hard validations that are implicit in the tool's user interface or soft constraints that are run as a batch when the model is opened, saved, code generation is attempted, or on explicit request.
To create the code generators, it is first necessary to have a complete, working, tested implementation of the desired target code. This code must be analyzed to determine which parts of it can be derived from elements in the model and what kinds of patterns must be applied to do this derivation. Sometimes, this will require the target code to be refactored to simplify or clarify these patterns. Note that refactoring is a transformation of the code that preserves its behavior, while restructuring it to make it easier to generate or modify. For refactoring to be successful it is necessary to have a suite of tests, which the code must pass before and after refactoring.
Also note that the generators need to operate over only valid models. It should not be necessary to implement the generators defensively so that they also handle invalid models because only valid models should be used as a source for code generation.
At this stage it is also important to consider the customization options that will be offered for the language. If there are places in the code where the user will be required to handwrite their own logic, these places must be identified, and suitable techniques must be used to make it easy for the user to complete their coding task, such as generating a call to a nonexistent function. It may also be useful to generate starting stubs for the user to fill in. It is also often useful to provide more general customization techniques for allowing unforeseen modifications, such as the double-derived technique discussed earlier.
When designing the diagrammatic structure of a DSL, it can be useful to take inspiration from the conventions established by the Unified Modeling Language (UML). For example, if there is an inheritance-like concept in the DSL, it would probably be perverse to represent it other than using an open triangle pointing at the more general element. For this reason, the starting languages offered by the DSL Tools are based diagrammatically on those of UML, although they use simpler underlying domain models. In our experience, the popularity of UML lies primarily in the fact that it offers a standard set of diagrammatic conventions, and not in the details of how these conventions are implemented.
Another approach to designing a DSL, for users who already have a UML tool, is to use UML itself as a starting point. By decorating the UML elements with stereotypes and tagged values, their meanings can be modified to correspond more directly to the desired domain. This approach can be successful in domains that are close to the intended meaning of the UML elements, but does make the creation of code generators and validation tools considerably more complicated than the simpler approach of designing a purpose-built domain model for the desired language.
DSLs and Software Factories
Although DSLs can be useful as a stand-alone tool, especially in very constrained domains, their use is most compelling as part of a complete software factory. You may think of a DSL as a software power tool, which is put together with other tools, guidance, and automation to constitute a complete factory. The software factories vision is explained by other articles in this issue, and in the popular book by Greenfield and Short (see Resources).
When a DSL is deployed as part of a factory, it must be integrated deeply with the other factory components. For example, menus and other UI gestures within the language may launch tools and actions associated with other parts of the factory. Conversely, menus and gestures within other parts of the factory might launch, or otherwise interact with, the DSL. In consequence, the entire factory should appear to its users as a seamless whole, intended for solving the user's problem, rather than a ragbag of loosely integrated tools.
To enable these integrations, it is important for the DSL to offer powerful, dynamic integration points, such as the ability to run (and undo) commands that act on the model and its associated artifacts, the generation of simple APIs for interacting with the models, and the serialization of models in XML files that enable processing by readily available tools. All of these factors have been taken into account in designing the DSL Tools, which are part of the overall software factory platform and authoring environment.
About the Author
Steve Cook is a software architect in the enterprise frameworks and tools group at Microsoft Corporation. He is one of the designers of the Domain-Specific Language Tools in the Visual Studio SDK. Previously he was a distinguished engineer at IBM and represented them in the specification of UML 2.0. He has worked in the IT industry for more than 30 years, as architect, programmer, consultant, author, researcher, and teacher. He is a member of the editorial board of the Software and Systems Modeling journal, a fellow of the British Computer Society, and holds an honorary doctor of science degree from De Montford University.
Resources
Domain-Specific Modeling (DSM) Forum
Greenfield, Jack, Keith Short, Steve Cook, and Stuart Kent. Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools. Indianapolis, IN: Wiley, 2004.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.