Implement Coroutines for .NET by Wrapping the Unmanaged Fiber API

2019-10-22

[Editor's Update - 9/16/2005: The solution described in this article relies on undocumented functionality that is not supported by Microsoft at this time. This article is intended for informational purposes only, and its contents should not be used to create production code.]

Coroutines

Implementing Coroutines for .NET by Wrapping the Unmanaged Fiber API

Ajai Shankar

Code download available at:CoroutinesinNET.exe(135 KB)

This article assumes you're familiar with Managed C++ and .NET

Level of Difficulty123

SUMMARY

Coroutines are a powerful feature of many programming languages including CLU, Scheme, Python, Ruby, and ICON. Coroutines can save processor overhead and reduce redundancy because they allow you to stop execution of a procedure midstream, return a value, and resume exactly where the procedure left off.

This article shows how coroutines can be implemented for the .NET Framework by using the Fiber API and Managed Extensions for C++, and how they can be easily used with other .NET-compliant languages. This article also shows a sophisticated use of the runtime host for running multiple managed threads on a single OS thread.

Contents

Coroutines and the .NET Framework
Fibers and Threads
Implementation Details
A Generator and Tree Matching Example
Conclusion

Fibers—lightweight threading objects that can be used from 32-bit versions of Windows®—are useful in many scenarios. Since threads are precious resources, sometimes you don't want to dedicate an entire OS thread to a simple task. Fibers allow you control the scheduling of tasks more closely than threads because the OS isn't in charge of them, you are. Since they have less overhead, they're also faster when you switch contexts. In addition, because you control fibers, you can often track down synchronization problems more easily with them.

Coroutines are a code-based concept, similar to fibers, that have been implemented in several modern languages like CLU, Scheme, Python, Ruby, and ICON. Simply put, a coroutine is a method that can stop in mid-execution and provide a value to the caller without returning program flow. Since coroutines and fibers are parallel concepts, the two should complement each other.

Coroutines are not currently implemented natively under the Microsoft® .NET Framework. (A C# version is planned for future release.) However, with a bit of clever coding involving the Win32® Fibers API and Managed Extensions for C++, you can create your own coroutine libraries today and take full advantage of this advanced programming concept.

Coroutines can be thought of as methods which, instead of returning to the caller, stop dead in the middle of processing and yield a value to the caller. The next time the coroutine is called, it resumes where it left off until it yields another value. For example, here is a tiny program, written without coroutines, that iterates over the elements of an array:

class Iter { public object Next() { return array[ndx++]; } object[] array = new object[] {1, 2, 3, 4}; int ndx; }

The array this code iterates over and the current index into the array are fields of the Iter class. The Next method returns the next element of the array. If a language supports coroutines, the same program could be written as:

class CorIter { public void Next() { object[] array = new object[] {1, 2, 3, 4}; for(int ndx = 0; true; ++ndx) Yield(arr[ndx]); } }

The important thing to note here is that each time the Next method is called, rather than returning, it saves its current execution state (locals like array, ndx, and the next intermediate language instruction) and calls Yield to provide the next element of the array to the caller. The first time the method is called, the index is 0 and the function yields the first element of the array. As the method is called subsequently, it resumes where it left off, incrementing the index and yielding the next element.

Coroutines don't really accomplish anything in a trivial example like this. But what if you need to write a recursive method that does an in-order traversal of a binary tree in order to return the nodes one at a time? Without coroutines this would be rather difficult to implement. Either you would have to remove the recursion manually and write an iterator similar to the one in the C++ Standard Template Library, or resort to the use of callbacks as each node is encountered. Both options put a burden on the programmer. In addition, the first option creates more work for the coder who implements the iterator while the callback option has its own problems for the user.

The presence of coroutines in a language is a convenient alternative to both of these options because they let you yield the nodes as they are encountered. The State Threads library from Netscape (https://state-threads.sourceforge.net) implements coroutines in C using setjmp and longjmp calls.

These two coroutine examples also show the concept of Python generators, which yield the next value of an array or binary tree as they are called. Using coroutines, you can enumerate the nodes of a binary tree from within C# with a simple foreach statement:

foreach(Node node in tree) { // process the node }

One of the real advantages of coroutines is that they can be aborted if the rest of the computation is deemed unnecessary. For example, to see if two trees contain the leaves in the same order (fringe-matching), I can create a coroutine that yields the leaves of the first tree and another that yields the leaves of the second tree. The moment I find a point where the leaves don't match, I can abort both of the coroutines and avoid walking the rest of the trees.

Coroutines and the .NET Framework

I made a number of major design and implementation decisions when implementing coroutines in the .NET Framework. On the presentation level, the interface presented to the user should be very clean, similar to those in other languages. Users should always have the ability to abort coroutines. A coroutine should never cause an application to abort, even if it generates an uncaught exception.

On the implementation level I used the Win32 Fiber API to implement coroutines, since the common language runtime (CLR) does not provide a way to create separate execution paths or to save the current execution state and resume it later. Because of the additional resource overhead, I decided against using threads to implement coroutines. I also wanted to ensure that switching between coroutines would not cause any problems within the CLR and that there would be proper cleanup of internal CLR states.

I used Managed Extensions for C++ for the core implementation since it allows you to switch easily between managed and unmanaged code. Also, as I will explain later, I had to hook in the CLR host from unmanaged code to notify the runtime of the presence of fibers.

Fibers and Threads

Fibers in Windows NT® can be thought of as lightweight threads that have to be manually scheduled. When a fiber is created, it is passed a fiber-start function. The OS then assigns it a separate stack and sets up execution to begin at this fiber-start function. To schedule this fiber, you need to switch to it manually. The OS saves the execution state of the currently running fiber, then starts executing the new fiber. When this fiber switches to another one, its execution state is similarly saved so that it can be resumed later. Many fibers can be scheduled on the same thread, and a fiber can also move between threads. If the currently running fiber exits, then the thread running that fiber also exits. The Platform SDK documentation explains fibers in more detail (see Fibers).

Each fiber has a separate stack and a set of registers. If a fiber is running managed code, then it is possible to save the managed execution state of a fiber, pass program control to another managed fiber that called it (a yield), and resume this state later. This is how coroutines can be implemented as a fiber in the .NET Framework.

I created a class called Fiber that contains an abstract method named Run and the methods shown in Figure 1. The Fiber constructor sets up execution to begin at a function which calls the abstract method Run. A call to the Resume method switches to this fiber, which will in turn call Run. A coroutine inherits from the Fiber class and provides the implementation of the Run method. Within the Run method and other methods called by it, the coroutine yields to the previous fiber.

Figure 1 The Fiber Class

enum FiberStateEnum { FiberCreated, FiberRunning, FiberStopPending, FiberStopped }; __gc __abstract public class Fiber : public System::IDisposable { public: Fiber() {} System::Object* Resume() {} void Dispose() {} protected: virtual void Run() = 0; void Yield(System::Object *obj) {} private: void *fiber; // this fiber void *previousfiber; // the fiber that scheduled this fiber FiberStateEnum state; // whether the fiber is running or has stopped. System::Object *retval; // the value that Resume returns };

To add some syntactic sugar to this class, I created a delegate called Coroutine, which wraps the Resume method of a Fiber object. I also added an implicit conversion operator in the Fiber class:

public __delegate System::Object* Coroutine(); static Coroutine* op_Implicit(Fiber *obj) { return new Coroutine(obj, &Fiber::Resume); }

Using the Fiber class and this Coroutine delegate, the code to iterate over an array can now be written as:

class CorIter : Fiber { protected override void Run() { object[] array = new object[] {1, 2, 3, 4}; for(int ndx = 0; true; ++ndx) Yield(arr[ndx]); } } Coroutine next = new CorIter(); Object o = next();

Each time the next coroutine is called, the fiber is resumed and execution continues at the state where it left off. Once the fiber yields the next array element, that value is then passed back to the calling fiber.

Implementation Details

Before I explain the code for the Fiber class in detail, let's review the interaction between fibers and the CLR. At first I had a very tough time trying to use fibers directly from within the .NET Framework, especially with exception handling. Then I discovered that the Platform Abstraction Layer (PAL) and the runtime make heavy use of thread local storage (TLS), as does the .NET exception-handling mechanism. The list of exception handlers for a managed thread is stored in a TLS slot. For more information on the PAL, see Rotor: Shared Source CLI Provides Source Code for a FreeBSD Implementation of .NET.

Since fibers running managed code all exist on the same thread, switching between them causes problems for the exception handlers on a "managed" fiber. To avoid this I hook in the CLR host from unmanaged code, and using four undocumented APIs I inform the runtime about the presence of fibers, and notify it just before switching between fibers. Here is the static constructor that binds to the CLR host:

#pragma unmanaged ICorRuntimeHost *corhost; void initialize_corhost() { CorBindToCurrentRuntime(0, CLSID_CorRuntimeHost, IID_ICorRuntimeHost, (void**) &corhost); } #pragma managed static Fiber() { initialize_corhost(); }

The following shows the fiber constructor and the unmanaged fiber-start function:

Fiber() : retval(0), state(FiberCreated) { void *objptr = (void*) GCHandle::op_Explicit(GCHandle::Alloc(this)); fiber = CreateFiber(0, unmanaged_fiberproc, objptr); }

The constructor for the Fiber object creates a new fiber, which once scheduled begins executing the unmanaged_fiberproc function. When a fiber is created, Windows allows you to pass context information to the fiber procedure. When the fiber is scheduled, it can access the information that was passed to it. The constructor passes a pointer to the Fiber object to the unmanaged function. Since it is not possible to pass managed pointers directly to unmanaged code, I used the GCHandle class in the System:: Runtime::InteropServices namespace to get an unmanaged pointer to the managed Fiber object (see Figure 2).

Figure 2 Passing a Pointer

#pragma unmanaged VOID CALLBACK unmanaged_fiberproc(PVOID objptr) { corhost->CreateLogicalThreadState(); void *previousfiber = fibermain(objptr); corhost->DeleteLogicalThreadState(); SwitchToFiber(previousfiber); } #pragma managed void* fibermain(void* objptr) { System::IntPtr ptr = (System::IntPtr) objptr; GCHandle g = GCHandle::op_Explicit(ptr); Fiber *fiber = static_cast<Fiber*>(g.Target); g.Free(); return fiber->main(); }

Now comes the tricky part. Once I switched to the fiber and the unmanaged code started executing, I needed to warn the runtime about the newly created fiber. The undocumented function ICorRuntimeHost::CreateLogicalThread tells the runtime to set up a managed thread, complete with its own exception list and other data that is internal to the CLR for this fiber. This means that there is a managed thread corresponding to each fiber. Since many fibers can be scheduled on the same OS thread, it implies that multiple managed threads can be run on the same OS thread. Also, as a fiber can be scheduled later on another OS thread, it is possible for a managed thread to move between different OS threads. The sample application, threads.cs, which is available with the code download for this article, demonstrates this concept (see the link at the top of this article).

The unmanaged function then calls the managed fibermain function, passing it the pointer to the managed Fiber object. Once fibermain returns, the CLR is out of the picture. Control returns to the unmanaged code, where I deleted the managed thread state that I created for this fiber. I then switch back to the fiber that last scheduled this fiber.

The managed fibermain function simply gets the Fiber object and calls its private main method:

void* main() { state = FiberRunning; try { Run(); } catch(System::Object *x) { System::Console::Error->WriteLine( S"\nFIBERS.DLL: main Caught {0}", x); } state = FiberStopped; retval = 0; return previousfiber; }

The main method sets the state of the fiber to "running," and sets up an exception handler to catch any exceptions. Unlike in C#, Managed C++ lets me catch exceptions that are not derived from the Exception class. I will put this feature to good use when I implement the Dispose method. Run could return normally or throw an exception. In either case, fibermain gets control, sets the state to stopped, and returns the fiber that last scheduled this fiber.

From managed code, the user would switch between fibers using the Resume and Yield methods. As shown earlier, the CLR needs to be informed before switching between fibers. You have to tell the CLR to save the managed thread state for the currently running fiber before doing an OS-level SwitchToFiber. The unmanaged function CorSwitchToFiber does this, as you can see:

#pragma unmanaged void CorSwitchToFiber(void *fiber) { DWORD *cookie; corhost->SwitchOutLogicalThreadState(&cookie); SwitchToFiber(fiber); corhost->SwitchInLogicalThreadState(cookie); }

CorSwitchToFiber saves the thread state in a local variable using the ICorRuntimeHost::SwitchOutLogicalThreadState method, which can be found in the Shared Source CLI(remember, each fiber has its own stack). After saving the state, it does an OS-level fiber switch. Later, when this fiber is switched in, either through a Yield or a Resume, the CLR thread state is restored:

System::Object* Resume() { if(!fiber || state == FiberStopped) return 0; previousfiber = GetCurrentFiber(); CorSwitchToFiber(fiber); return retval; }

The Resume method first checks to see if the fiber has already stopped; otherwise it sets the previousfiber field to the currently running fiber and does a CorSwitchToFiber.

Yield can be called either from Run (which the coroutine implements) or from a method that Run calls. Yield switches to the fiber that scheduled the current fiber, passing it the value to be yielded, as shown in the following code:

__gc private struct StopFiber {}; void Yield(System::Object *obj) { retval = obj; CorSwitchToFiber(previousfiber); if(state == FiberStopPending) throw new StopFiber; }

Yield sets the retval field and switches to the previous fiber (which was paused in the call to Resume), ready to execute the return retval statement. Note that there is no need for synchronization primitives, as fibers are cooperatively scheduled and run on the same thread. If Run simply returns without yielding a value, or if there is an exception, a null value is returned from fibermain.

The only pieces remaining in this implementation of Yield are the check for FiberStopPending and the throw of StopFiber. One of my design considerations was that a coroutine could be aborted and its remaining computation abandoned. It is not possible to simply delete the fiber because the CLR will still hold some thread and execution state info for the fiber. Because I needed the unmanaged fiber start procedure to get control (so that it could delete the CLR thread state created for the fiber), I implemented the IDisposable::Dispose method:

void Dispose() { if(state == FiberRunning) { previousfiber = GetCurrentFiber(); state = FiberStopPending; CorSwitchToFiber(fiber); } DeleteFiber(fiber); }

Dispose first checks to see if the fiber has run to completion. If not, it sets the state to FiberStopPending and switches to the fiber. Because this fiber was running earlier, it has obviously yielded. So Yield is all set to execute the statement:

if(state == FiberStopPending) throw new StopFiber;

Since this exception is not derived from the Exception class, it cannot be caught from C#, and so gets propagated to the main method that called run. As shown earlier, the main method sets the fiber state to stopped and also clears out the retval. In this way, control comes back cleanly to the unmanaged fiber function, which deletes the CLR thread state for this fiber and switches back to the fiber that called Dispose.

Since the CLR is no longer in the picture when control comes back to the unmanaged fiber-start function, I can be sure that the internal CLR state for this fiber has been properly cleaned up. But the use of an exception (derived from System.Object) to ensure proper cleanup imposes a small restriction on the use of this class. If this implementation is used from a .NET Framework-compliant language that supports catching an exception not derived from the Exception class, then the overridden Run should not have a generic catch clause.

A Generator and Tree Matching Example

Now that all the hard stuff has been done in Managed Extensions for C++, I can use the power of coroutines from other languages. I used the tree-matching example from "Teach Yourself Scheme in Fixnum Days" by Dorai Sitaram and implemented it in C#. As you will see, it is also easy to write a generator that enumerates the leaves of the tree. The two trees have matching fringes, regardless of depth, if two leaves are in the same order, as shown in Figure 3.

Figure 3 Two Trees

Figure 3** Two Trees **

To see if the fringes match without using coroutines, I would have to flatten the trees by traversing them and putting their leaves in a list or an array. After that, I could loop through the lists in order to see if they matched. Even if the first leaves didn't match, I've already incurred the processing overhead of walking each tree completely to flatten it.

A coroutine that walks the first tree could yield each leaf as it is encountered. A second coroutine could yield the leaves of the second tree. At the first instance where the two leaves do not match, you are finished with the computation and both coroutines are aborted. The following code, which uses the Visitor design pattern, illustrates this point:

interface TreeVisitor { void Visit(Branch branch); void Visit(Leaf leaf); } interface Tree { void Accept(TreeVisitor visitor); }

This uses an interface called Tree with a single method which accepts a TreeVisitor. The Branch and Leaf classes implement the Tree interface. A Branch is a recursive data type that has a left and right node of type Tree, either of which could be another Branch or a Leaf, as shown here:

class Branch : Tree { public Branch(Tree l, Tree r) { left = l; right = r; } public void Accept(TreeVisitor visitor) { visitor.Visit(this); } public Tree left, right; } class Leaf : Tree { public Leaf(Object val) { value = val; } public void Accept(TreeVisitor visitor) { visitor.Visit(this); } public Object value; }

As C# does not support multiple dispatch, I simulated it by making both Branch and Leaf override the Accept method and called the Visit(Branch) or Visit(Leaf) methods of the TreeVisitor.

The tree visitor that I have is a coroutine. For a branch it recursively visits the left and right subtree, while for a leaf it just yields its value, as shown in Figure 4.

Figure 4 Traversing a Tree

class LeafGenerator : Fiber, TreeVisitor, IEnumerator { public enum Order { Left, Right }; public LeafGenerator(Tree _tree, Order _order) { tree = _tree; order = _order; } public void Visit(Branch branch) { if(order == Order.Left) { branch.left.Accept(this); branch.right.Accept(this); } else { branch.right.Accept(this); branch.left.Accept(this); } } public void Visit(Leaf leaf) { Yield(leaf.value); } protected override void Run() { tree.Accept(this); } private Tree tree; private Object curobj; private Order order; }

The Visit(Branch) method in the code I just showed recursively visits the left and right nodes in the order specified by the user. If the left or right node happens to be a Leaf, then its value is yielded. If the node is a Branch, the Visit(Branch) method will be recursively called again. The recursion stops once a Leaf is encountered.

The Run method that the coroutine implements kicks off the recursive process of walking the tree by asking the tree to accept itself as the visitor. This coroutine can be used as a generator in a statement such as this:

foreach(Leaf leaf in tree) { }

I also implemented the System::Collections::IEnumerator interface:

public Object Current { get { return curobj; } } public bool MoveNext() { curobj = Resume(); return curobj != null; }

MoveNext calls Resume, which yields either the next leaf or null if the tree is exhausted. This value is stored in the field curobj, and is returned when the Current property is called.

Compilers automatically generate code to call the IDisposable::Dispose method if the enumerator supports it. As the Fiber class implements the disposable interface, proper cleanup is made even if the user breaks out of the enumeration.

The tree-matching code is very simple. If I have two coroutines (leaf generators) that yield the leaves of the first and the second tree, I call each coroutine in succession, then check the values that they yield. If both return null (indicating that all the nodes of the trees are exhausted), then the trees match. If one coroutine yields null and another yields a Leaf it means that the first tree has been fully traversed and the second tree still has branches or leaves to be visited, so the trees do not match. If the current leaves are the same, then I recursively call the procedure to check the next leaves. If they are not the same, I immediately return a nonsuccessful match, as shown here:

static bool MatchFringe(Coroutine nexta, Coroutine nextb) { Object a = nexta(); Object b = nextb(); if(a == null && b == null) return true; if(a == null || b == null) return false; if(a.Equals(b)) return MatchFringe(nexta, nextb); else return false; }

If I find that the trees do not match, I can abort both fibers by calling the Dispose method. The complete code for the tree matching example can be found in the code download for this article in the file tree.cs.

Conclusion

The implementation of fibers and coroutines I have demonstrated in this article could easily be used to port event-driven applications to a multithreaded model that more closely matches the underlying OS, but without the associated overhead of extra threads. For example, a complex application that implements asynchronous callbacks (like an Internet download app) could be transformed into a simpler coroutine-based application. This implementation could also be used internally by a compiler that targets the CLR as the runtime for a language that provides coroutine support.

I have not touched upon all possible angles of fibers as used with the .NET Framework. There are potential issues when fibers are used with serviced components, and the interaction between fibers and .NET exceptions can be tricky. In unmanaged structured exception handling, a fiber switch within a catch or finally block could cause problems because these blocks are run on a separate shared stack.

One other issue I've found is that coroutine switches can deadlock from within the garbage collector thread. The CLR freezes all managed threads when garbage collection is running, so I did not implement a Finalize method to automatically delete the fiber. If a fiber is still running, the burden falls on the programmer to call Dispose once a coroutine is no longer needed. Since each fiber has a corresponding managed thread, there's no way to switch to the fiber and schedule it from within the garbage collection thread. I did not implement a Finalize method that calls Dispose because Dispose switches to the fiber to stop it. Finally, note that this technique uses internal calls that may change in the future. If you use these techniques, you're on your own!

For background information see:
Teach Yourself Scheme in Fixnum Days
Threads and Threading

Ajai Shankar is a consultant for IRIS Software Inc. in NJ and has designed and implemented projects for Wal-Mart, Ford, and MasterCard in a variety of programming languages. He can be reached at ajai_shankar@hotmail.com.

Share via

Additional resources