2015-06-29

January 2014

Volume 29 Number 1

WPF : Build Fault-Tolerant Composite Applications

There’s a widespread need for composite applications, but fault-tolerance requirements vary. In some scenarios, it might be OK for a single failing plug-in to bring down the whole application. In other scenarios, this isn’t acceptable. In this article, I describe an architecture for a fault-tolerant composite desktop application. This proposed architecture will provide a high level of isolation by running each plug-in in its own Windows process. I built it with the following design goals in mind:

Strong isolation between the host and plug-ins
Complete visual integration of plug-in controls into the host window
Easy development of new plug-ins
Reasonably easy conversion of existing applications to plug-ins
Ability for plug-ins to use services provided by the host, and vice versa
Reasonably easy addition of new services and interfaces

The accompanying source code (msdn.microsoft.com/magazine/msdnmag0114) contains two Visual Studio 2012 solutions: WpfHost.sln and Plugins.sln. Compile the host first, and then compile the plug-ins. The main executable file is WpfHost.exe. The plug-in assemblies are loaded on demand. Figure 1 shows the completed application.

Figure 1 The Host Window Seamlessly Integrates with the Out-of-Process Plug-Ins

Architectural Overview

The host displays a tab control and a “+” button in the top-left corner that shows a list of available plug-ins. The list of plug-ins is read from the XML file named plugins.xml, but alternative catalog implementations are possible. Each plug-in is executed in its own process, and no plug-in assemblies are loaded into the host. A high-level view of the architecture is shown in Figure 2.

Figure 2 A High-Level View of the Application Architecture

Internally, the plug-in host is a regular Windows Presentation Foundation (WPF) application that follows the Model-View-ViewModel (MVVM) paradigm. The model part is represented by the PluginController class, which holds a collection of loaded plug-ins. Each loaded plug-in is represented by an instance of the Plugin class, which holds one plug-in control and talks to one plug-in process.

The hosting system consists of four assemblies, organized as shown in Figure 3.

Figure 3 The Assemblies of the Hosting System

WpfHost.exe is the host application. PluginProcess.exe is the plug-in process. One instance of this process loads one plug-in. WpfHost.Interfaces.dll contains common interfaces used by the host, the plug-in process and the plug-ins. PluginHosting.dll contains types used by the host and the plug-in process for plug-in hosting.

Loading a plug-in involves some calls that must be executed on the UI thread, and some calls that can be executed on any thread. To make the application responsive, I only block the UI thread when strictly necessary. Hence, the programming interface for the Plugin class is broken into two methods, Load and CreateView:

class Plugin
{
  public FrameworkElement View { get; private set; }
  public void Load(PluginInfo info); // Can be executed on any thread
  public void CreateView();          // Must execute on UI thread
}

The Plugin.Load method starts a plug-in process and creates the infrastructure on the plug-in process side. It’s executed on a worker thread. The Plugin.CreateView method connects the local view to the remote FrameworkElement. You’ll need to execute this on the UI thread to avoid exceptions such as an InvalidOperationException.

The Plugin class ultimately calls a user-defined plug-in class inside the plug-in process. The only requirement for that user class is that it implements the IPlugin interface from the WpfHost.Interfaces assembly:

public interface IPlugin : IServiceProvider, IDisposable
{
  FrameworkElement CreateControl();
}

The framework element returned from the plug-in may be of arbitrary complexity. It might be a single text box or an elaborate user control that implements some line-of-business (LOB) application.

The Need for Composite Applications

Over the last few years, a number of my clients have expressed the same business need: desktop applications that can load external plug-ins, thus combining several LOB applications under one “roof.” The underlying reason for this requirement can vary. Multiple teams might develop different parts of the application on different schedules. Different business users might require different sets of features. Or maybe the clients want to ensure the stability of the “core” application, while at the same time maintaining flexibility. One way or another, the requirement to host third-party plug-ins has come up more than once within different organizations.

There are several traditional solutions for this problem: the classic Composite Application Block (CAB), the Managed Add-In Framework (MAF), the Managed Extensibility Framework (MEF) and Prism. Another solution was published in the August 2013 issue of MSDN by my former colleagues Gennady Slobodsky and Levi Haskell (see the article, “Architecture for Hosting Third-Party .NET Plug-Ins,” at msdn.microsoft.com/magazine/dn342875). These solutions are all of great value, and many useful applications have been created using them. I’m an active user of these frameworks as well, but there’s one problem that kept haunting me for quite some time: stability.

Applications crash. That’s a fact of life. Null references, unhandled exceptions, locked files and corrupted databases aren’t going to disappear anytime soon. A good host application must be able to survive a plug-in crash and move on. A faulty plug-in must not be allowed to take down the host or other plug-ins. This protection need not be bulletproof; I’m not trying to prevent malicious hacking attempts. However, simple mistakes such as an unhandled exception in a worker thread shouldn’t bring the host down.

Isolation Levels

Microsoft .NET Framework applications can handle third-party plug-ins in at least three different ways:

No isolation: Run host and all plug-ins in a single process with a single AppDomain.
Medium isolation: Load each plug-in in its own AppDomain.
Strong isolation: Load each plug-in in its own process.

No isolation entails the least protection and least control. All data is globally accessible, there’s no fault protection and there’s no way to unload the offending code. The most typical cause of an application crash is an unhandled exception in a worker thread created by a plug-in.

You can try to protect host threads with try/catch blocks, but when it comes to plug-in-created threads, all bets are off. Starting with the .NET Framework 2.0, an unhandled exception in any thread terminates the process, and you can’t prevent this. There’s a good reason for such seeming cruelty: An unhandled exception means the application probably has become unstable, and letting it continue is dangerous.

Medium isolation provides more control over a plug-in’s security and configuration. You can also unload plug-ins, at least when things are going well and no threads are busy executing unmanaged code. However, the host process still isn’t protected from plug-in crashes, as demonstrated in my article, “AppDomains Won’t Protect Host from a Failing Plug-In” (bit.ly/1fO7spO). Designing a reliable error-handling strategy is difficult, if not impossible, and the unloading of the failing AppDomain isn’t guaranteed.

AppDomains were invented for hosting ASP.NET applications as lightweight alternatives to processes. See Chris Brumme’s 2003 blog post, “AppDomains (“application domains”),” at bit.ly/PoIX1r. ASP.NET applies a relatively hands-off approach to fault tolerance. A crashing Web application can easily bring down the whole worker process with multiple applications. In this case, ASP.NET simply restarts the worker process and reissues any pending Web requests. This is a reasonable design decision for a server process with no user-facing windows of its own, but it might not work as well for a desktop application.

Strong isolation provides the ultimate level of protection against failures. Because each plug-in runs in its own process, the plug-ins can’t crash the host, and they can be terminated at will. At the same time, this solution requires a rather complex design. The application has to deal with a lot of inter-process communication and synchronization. It also must marshal WPF controls across process boundaries, which isn’t trivial.

As with other things in software development, choosing an isolation level is a trade-off. Stronger isolation gives you more control and more flexibility, but you pay for it with increased application complexity and slower performance.

Some frameworks choose to ignore fault tolerance and work at the “no isolation” level. MEF and Prism are good examples of that approach. In cases where fault tolerance and fine-tuning plug-in configuration aren’t issues, this is the simplest solution that works and is therefore the correct one to use.

Many plug-in architectures, including the one proposed by Slobodsky and Haskell, use medium isolation. They achieve isolation via AppDomains. AppDomains give host developers a significant degree of control over plug-in configuration and security. I personally built a number of AppDomain-based solutions over the past several years. If the application requires unloading code, sandboxing and configuration control—and if fault tolerance isn’t an issue—then AppDomains are definitely the way to go.

MAF stands out among add-in frameworks because it lets host developers choose any of the three isolation levels. It can run an add-in in its own process using the AddInProcess class. Unfortunately, AddInProcess doesn’t work for visual components out of the box. It might be possible to extend MAF to marshal visual components across processes, but this would mean adding another layer to an already complex framework. Creating MAF add-ins isn’t easy, and with another layer on top of MAF, the complexity is likely to become unmanageable.

My proposed architecture aims to fill the void and provide a robust hosting solution that loads plug-ins in their own processes and provides visual integration between the plug-ins and the host.

Strong Isolation of Visual Components

When a plug-in load is requested, the host process spawns a new child process. The child process then loads a user plug-in class that creates a FrameworkElement displayed in the host (see Figure 4).

Figure 4 Marshaling a FrameworkElement Between the Plug-In Process and the Host Process

The FrameworkElement can’t be marshaled directly between processes. It doesn’t inherit from MarshalByRefObject, nor is it marked as [Serializable], so .NET remoting won’t marshal it. It isn’t marked with the [ServiceContract] attribute, so Windows Communication Foundation (WCF) won’t marshal it, either. To overcome this problem, I use the System.Addin.FrameworkElementAdapters class from the System.Windows.Presentation assembly that’s part of MAF. This class defines two methods:

The ViewToContractAdapter method converts a FrameworkElement to an INativeHandleContract interface, which can be marshaled with .NET remoting. This method is called inside the plug-in process.
The ContractToViewAdapter method converts an INativeHandleContract instance back to FrameworkElement. This method is called inside the host process.

Unfortunately, the simple combination of these two methods doesn’t work well out of the box. Apparently, MAF was designed to marshal WPF components between AppDomains and not between processes. The ContractToViewAdapter method fails on the client side with the following error:

System.Runtime.Remoting.RemotingException:
Permission denied: cannot call non-public or static methods remotely

The root cause is that the ContractToViewAdapter method calls the constructor of the class, MS.Internal.Controls.AddInHost, which attempts to cast the INativeHandleContract remoting proxy to type AddInHwndSourceWrapper. If the cast succeeds, then it calls the internal method RegisterKeyboardInputSite on the remoting proxy. Calling internal methods on cross-process proxies isn’t allowed. Here’s what’s happening inside the AddInHost class constructor:

// From Reflector
_addInHwndSourceWrapper = contract as AddInHwndSourceWrapper;
if (_addInHwndSourceWrapper != null)
{
  _addInHwndSourceWrapper.RegisterKeyboardInputSite(
    new AddInHostSite(this)); // Internal method call!
}

To eliminate this error, I created the NativeContractInsulator class. This class lives on the server (plug-in) side. It implements the INativeHandleContract interface by forwarding all calls to the original INativeHandleContract returned from the ViewToContractAdapter method. However, unlike the original implementation, it can’t be cast to AddInHwndSourceWrapper. Thus, the cast on the client (host) side isn’t successful and the forbidden internal method call doesn’t occur.

Examining the Plug-In Architecture in More Detail

The Plugin.Load and Plugin.CreateView methods create all necessary moving parts for plug-in integration.

Figure 5 shows the resulting object graph. It’s somewhat complicated, but each part is responsible for a particular role. Together, they ensure seamless and robust operation of the host plug-in system.

Figure 5 Object Diagram of a Loaded Plug-In

The Plugin class denotes a single plug-in instance in the host. It holds the View property, which is the plug-in’s visual representation inside the host process. The Plugin class creates an instance of PluginProcessProxy and retrieves from it an IRemotePlugin. IRemotePlugin contains a remote plug-in control in the form of INativeHandleContract. The Plugin class then takes that contract and converts it to FrameworkElement as shown here (with some code elided for brevity):

public interface IRemotePlugin : IServiceProvider, IDisposable
{
  INativeHandleContract Contract { get; }
}
class Plugin
{
  public void CreateView()
  {
    View = FrameworkElementAdapters.ContractToViewAdapter(
      _remoteProcess.RemotePlugin.Contract);
  }}

The PluginProcessProxy class controls the plug-in process lifecycle from within the host. It’s responsible for starting the plug-in process, creating a remoting channel and monitoring plug-in process health. It also engages the PluginLoader service and from that retrieves an IRemotePlugin.

The PluginLoader class runs inside the plug-in process and implements the plug-in process lifecycle. It establishes a remoting channel, starts a WPF message dispatcher, loads a user plug-in, creates a RemotePlugin instance and hands this instance to the PluginProcessProxy on the host side.

The RemotePlugin class makes the user plug-in control marshalable across process boundaries. It converts the user’s FrameworkElement to INativeHandleContract and then wraps this contract with a NativeHandleContractInsulator to work around illegal method call issues described earlier.

Finally, the user’s plug-in class implements the IPlugin interface. Its main job is to create a plug-in control inside the plug-in process. Typically this will be a WPF UserControl, but it can be any FrameworkElement.

When a plug-in load is requested, the PluginProcessProxy class spawns a new child process. The child process executable is either PluginProcess.exe or PluginProcess64.exe, depending on whether the plug-in is 32-bit or 64-bit. Each plug-in process receives a unique GUID in the command line, as well as the plug-in base directory:

PluginProcess.exe
  PluginProcess.0DAA530F-DCE4-4351-8D0F-36B0E334FF18
  c:\plug-in\assembly.dll

The plug-in process sets up a remoting service of type IPluginLoader and raises a named “ready” event, in this case, PluginProcess.0DAA530F-DCE4-4351-8D0F-36B0E334FF18.Ready. The host then can use IPluginLoader methods to load the plug-in.

An alternative solution would be to have the plug-in process call into the host once it’s ready. This would eliminate the need for the ready event, but it would make error handling much more complicated. If the “load plug-in” operation originates from the plug-in process, error information is also retained in the plug-in process. If something goes wrong, the host might never find out about it. Therefore, I chose the design with the ready event.

Another design issue was whether to accommodate plug-ins not deployed under the WPF host directory. On one hand, in the .NET Framework, loading assemblies not located inside the application directory causes certain difficulties. On the other hand, I recognize that plug-ins might have their own deployment concerns, and it might not always be possible to deploy a plug-in under the WPF host directory. Moreover, some complex applications don’t behave properly when they aren’t run from their base directories.

Because of these concerns, the WPF host allows loading plug-ins from anywhere on the local file system. To achieve that, the plug-in process performs virtually all operations in a secondary AppDomain whose application base directory is set to the plug-in’s base directory. This creates the problem of loading WPF host assemblies in that AppDomain. This could be achieved in at least four ways:

Put WPF host assemblies in the Global Assembly Cache (GAC).
Use assembly redirects in the app.config file of the plug-in process.
Load WPF host assemblies using one of the LoadFrom/CreateInstanceFrom overrides.
Use the unmanaged hosting API to start the CLR in the plug-in process with the desired configuration.

Each of these solutions has pros and cons. Putting WPF host assemblies in the GAC requires administrative access. While the GAC is a clean solution, requiring administrative rights for installation can be a big headache in a corporate environment, so I tried to avoid that. Assembly redirects are also attractive, but configuration files will then depend on the location of the WPF host. This makes an xcopy install impossible. Creating an unmanaged hosting project seemed to be a big maintenance risk.

So I went with the LoadFrom approach. The big downside of this approach is WPF host assemblies end up in the LoadFrom context (see the blog post, “Choosing a Binding Context,” by Suzanne Cook at bit.ly/cZmVuz). To avoid any binding issues, I needed to override the AssemblyResolve event in the plug-in AppDomain, so the plug-in’s code can find WPF host assemblies more easily.

Developing Plug-Ins

You can implement a plug-in as a class library (DLL) or an executable (EXE). In the DLL scenario, the steps are as follows:

Create a new class library project.
Reference the WPF assemblies PresentationCore, PresentationFramework, System.Xaml and WindowsBase.
Add a reference to the WpfHost.Interfaces assembly. Make sure “copy local” is set to false.
Create a new WPF user control, such as MainUserControl.
Create a class named Plugin that derives from IKriv.WpfHost.Interfaces.PluginBase.
Add an entry for your plug-in to the plugins.xml file of the host.
Compile your plug-in and run the host.

A minimal plug-in class looks like this:

public class Plugin : PluginBase
{
  public override FrameworkElement CreateControl()
  {
    return new MainUserControl();
  }
}

Alternatively, a plug-in can be implemented as an executable. In this case, the steps are:

Create a WPF application.
Create a WPF user control, for example, MainUserControl.
Add MainUserControl to the application’s main window.
Add a reference to the WpfHost.Interfaces assembly. Make sure “copy local” is set to false.
Create a class named Plugin that derives from IKriv.WpfHost.Interfaces.PluginBase.
Add an entry of your plug-in to the plugins.xml file of the host.

Your plug-in class would look exactly like the preceding example, and your main window XAML should contain nothing but a reference to MainUserControl:

<Window x:Class="MyPlugin.MainWindow"
  xmlns="https://schemas.microsoft.com/winfx/2006/xaml/presentation"
  xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml"
  xmlns:local="clr-namespace:MyProject"
  Title="My Plugin" Height="600" Width="766" >
  <Grid>
    <local:MainUserControl />
  </Grid>
</Window>

A plug-in implemented like this can run as a standalone application or within the host. This simplifies debugging plug-in code not related to host integration. The class diagram for such a “dual-head” plug-in is shown in Figure 6.

Figure 6 The Class Diagram for a Dual-Head Plug-In

This technique also provides an avenue for quick conversion of existing applications to plug-ins. The only thing you need to do is convert the application’s main window into a user control. Then instantiate that user control in a plug-in class as demonstrated earlier. The Solar System plug-in in the accompanying code download is an example of such conversion. The whole conversion process took less than an hour.

Because the plug-in isn’t an independent application, but instead is launched by the host, debugging might not be straightforward. You can start debugging the host, but Visual Studio can’t yet attach to child processes automatically. You can either manually attach the debugger to the plug-in process once it’s running or have the plug-in process break into the debugger on startup by changing line 4 of the PluginProcess app.config to:

<add key="BreakIntoDebugger" value="True" />

Another alternative is to create your plug-in as a standalone application as described earlier. You can then debug most of the plug-in as a standalone application, only periodically checking that integration with the WPF host works properly.

If the plug-in process breaks into the debugger on startup, you’ll want to increase the ready event timeout by changing line 4 of the WpfHost app.config file, as follows:

<add key="PluginProcess.ReadyTimeoutMs" value="500000" />

A list of example plug-ins available in the accompanying code download and descriptions of what they do is shown in Figure 7.

Figure 7 Example Plug-Ins Available in the Accompanying Code Download

Plug-In Project	What It Does
BitnessCheck	Demonstrates how a plug-in can run as 32-bit or 64-bit
SolarSystem	Demonstrates an old WPF demo application converted to a plug-in
TestExceptions	Demonstrates exception handling for user thread and worker thread exceptions
UseLogServices	Demonstrates use of host services and plug-in services

Host Services and Plug-In Services

In the real world, plug-ins often need to use services provided by the host. I demonstrate this scenario in the UseLogService plug-in in the code download. A plug-in class might have a default constructor or a constructor that takes one parameter of type IWpfHost. In the latter case, the plug-in loader will pass an instance of WPF host to the plug-in. Interface IWpfHost is defined as follows:

public interface IWpfHost : IServiceProvider
{
  void ReportFatalError(string userMessage,
     string fullExceptionText);
  int HostProcessId { get; }
}

I use the IServerProvider part in my plug-in. IServiceProvider is a standard .NET Framework interface defined in mscorlib.dll:

public interface IServiceProvider
{
  object GetService(Type serviceType);
}

I’ll use it in my plug-in to obtain the ILog service from the host:

class Plugin : PluginBase
{
  private readonly ILog _log;
  private MainUserControl _control;
  public Plugin(IWpfHost host)
  {
    _log = host.GetService<ILog>();
  }
  public override FrameworkElement CreateControl()
  {
    return new MainUserControl { Log = _log };
  }
}

The control can then use the ILog host service to write to the host’s log file.

The host can also use services provided by plug-ins. I defined one such service called IUnsavedData, which proved to be useful in real life. By implementing this interface, a plug-in may define a list of unsaved work items. If the plug-in or the whole host application is closed, the host will ask the user whether he wants to abandon unsaved data, as shown in Figure 8.

Figure 8 Using the IUnsavedData Service

The IUnsavedData interface is defined as follows:

public interface IUnsavedData
{
  string[] GetNamesOfUnsavedItems();
}

A plug-in author doesn’t need to implement the IServiceProvider interface explicitly. It’s enough to implement the IUnsavedData interface in the plug-in. The PluginBase.GetService method will take care of returning it to the host. My UseLogService project in the code download provides a sample IUnsavedData implementation, with relevant code shown here:

class Plugin : PluginBase, IUnsavedData
{
  private MainUserControl _control;
  public string[] GetNamesOfUnsavedItems()
  {
    if (_control == null) return null;
    return _control.GetNamesOfUnsavedItems();
  }
}

Logging and Error Handling

WPF host and plug-in processes create logs in the %TMP%\WpfHost directory. The WPF host writes to WpfHost.log and each plug-in host process writes to PluginProcess.Guid.log (the “Guid” isn’t part of the literal name, but is expanded to the actual Guid value). The log service is custom-built. I avoided using popular logging services such as log4net or NLog to make the sample self-contained.

A plug-in process also writes results to its console window, which you can show by changing line 3 of the WpfHost app.config to:

<add key="PluginProcess.ShowConsole" value="True" />

I took great care to report all errors to the host and handle them gracefully. The host monitors plug-in processes and will close the plug-in window if a plug-in process dies. Similarly, a plug-in process monitors its host and will close if the host dies. All errors are logged, so examining log files helps tremendously with troubleshooting.

It’s important to remember that everything passed between the host and the plug-ins must be either [Serializable] or of type derived from MarshalByRefObject. Otherwise, .NET remoting won’t be able to marshal the object between the parties. The types and interfaces must be known to both parties, so typically only built-in types and types from WpfHost.Interfaces or PluginHosting assemblies are safe for marshaling.

Versioning

WpfHost.exe, PluginProcess.exe and PluginHosting.dll are tightly coupled and should be released simultaneously. Fortunately, plug-in code doesn’t depend on any of these three assemblies and therefore they can be modified in almost any way. For example, you can easily change the synchronization mechanism or the name of the ready event without affecting the plug-ins.

The WpfHost.Interfaces.dll component must be versioned with extreme care. It should be referenced, but not included with the plug-in code (CopyLocal=false), so the binary for this assembly always comes only from the host. I didn’t give this assembly a strong name because I specifically don’t want side-by-side execution. Only one version of WpfHost.Interfaces.dll should be present in the entire system.

Generally, you should regard plug-ins as third-party code not under control of the host authors. Modifying or even recompiling all plug-ins at once might be difficult or impossible. Therefore, new versions of the interface assembly must be binary-compatible with previous versions, with the number of breaking changes held to an absolute minimum.

Adding new types and interfaces to the assembly is generally safe. Any other modifications, including adding new methods to interfaces or new values to enums, can potentially break binary compatibility and should be avoided.

Even though the hosting assemblies don’t have strong names, it’s important to increment the version numbers after any change, however small, so no two assemblies with the same version number have different code.

A Good Starting Point

My reference architecture provided here isn’t a production-quality framework for plug-in-host integration, but it comes quite close and can serve as a valuable starting point for your application.

The architecture takes care of boilerplate yet difficult considerations such as the plug-in process lifecycle, marshaling plug-in controls across processes, a mechanism for exchange, and discovery of services between the host and the plug-ins, among others. Most design solutions and workarounds aren’t arbitrary. They’re based on actual experience in building composite applications for WPF.

You’ll most likely want to modify the visual appearance of the host, replace the logging mechanism with the standard one used in your enterprise, add new services and possibly change the way plug-ins are discovered. Many other modifications and improvements are possible.

Even if you don’t create composite applications for WPF, you might still enjoy examining this architecture as a demonstration of how powerful and flexible the .NET Framework can be and how you can combine familiar components in an interesting, unexpected and productive way.

Ivan Krivyakov is a technical lead at Thomson Reuters. He’s a hands-on developer and architect who specializes in building and improving complex line-of-business (LOB) Windows Presentation Foundation applications.

Thanks to the following Microsoft technical experts for reviewing this article: Dr. James McCaffrey, Daniel Plaisted and Kevin Ransom
Kevin Ransom has worked at Microsoft for 14 years on a number of projects, including: Common Language Runtime, Microsoft Business Framework, Windows Vista and Windows 7, Managed Extensibility Framework and the Base Class Libraries. He currently is working in Managed Languages on Visual FSharp.

Dr. James McCaffrey works for Microsoft at the Redmond, Wash., campus. He has worked on several Microsoft products including Internet Explorer and MSN Search. He’s the author of “.NET Test Automation Recipes” (Apress, 2006), and can be reached at jammc@microsoft.com.

Since joining Microsoft in 2008, Daniel Plaisted has worked on the Managed Extensibility Framework (MEF), Portable Class Libraries (PCL) and the Microsoft .NET Framework for Windows Store apps. He has presented at MS TechEd, BUILD and various local groups, code camps and conferences. In his free time, he enjoys computer games, reading, hiking, juggling and footbagging (hackey-sack). His blog can be found at blogs.msdn.com/b/dsplaisted/ and he can be reached at daplaist@microsoft.com."

Share via