Testing the Windows Subsystem for Linux
Following the release of Windows Subsystem for Linux, the development team has created and released a series of technical deep dive blog posts. For a better understanding of the foundations of WSL check out the WSL Overview post.
Thank you to Ben Hillis, Vikas Agrawal, and Sunil Muthuswamy for providing content and review.
The goal of Windows Subsystem for Linux (WSL) is to provide a command line developer environment identical to those on *nix systems without requiring a virtual machine. This primarily consists of access to command line utilities, native ELF64 binary execution, and support common developer frameworks like MEAN and LAMP web server stacks. Additionally, to make the system feel as native as possible, WSL exposes both the Windows file system and networking stack automatically. All of this is accomplished via a kernel level emulation layer that translates system calls from Linux to their corresponding NT counterpart. Where there is no direct NT counterpart WSL fills those gaps.
WSL was initially released one year ago at //Build 2016 to an enthusiastic group of users. Since then, our active user count has continued to climb. At time of writing, WSL is used by over half a million people each month. With a community of this size we wanted to be transparent about what sort of quality should be expected. Shipping WSL with a beta tag was a no-brainer for us. We knew that the subsystem wasn't in a place where it would be suited for a production environment and wanted to be clear about what user's expectations should be. To us, the beta tag conveyed the message, "Some things will break and there will be gaps."
As development progressed, we have looked for ways to quantify WSL's shortcomings and lay out an implementation road map for the missing features needed to reach parity with bare metal Linux. These efforts have focused on four main pillars: the Linux Test Project, scenario based testing, fuzzing, and feedback from our community.
An important tool in improving our compatibility has been the Linux Test Project (LTP). LTP is a jointly created repository of unit tests designed for validating the system call and file system interfaces of the Linux kernel. LTP has been invaluable in helping us verify the robustness and stability of our implementation by providing a standardized regression check as well as a road map for future development. LTP has told us a great deal about the completeness of WSL's system call implementation. We also publish the results of LTP on our Github.
LTP in the Anniversary Update
When the Anniversary Update shipped, our LTP pass rate was decent, but not great. We started using LTP relatively late in the development cycle and that didn’t leave much time to tune our system call implementations to improve the pass rate. The results below are using LTP version 20150420.
|Pass % (passing/(total - skipped))||69.45%|
LTP in the Creators Update
In the Creators Update WSL reports Linux kernel version 4.4. We have bumped our LTP version to 20160510 to match. In the below table, you’ll notice there’s an additional row not present in the LTP results from the Anniversary Update. The unimplemented designation means that the test is failing, but the system call is not implemented in WSL. This distinction is important to us because we implement system calls as-needed based on user input and telemetry. Many of the system calls that we have not implemented are not used or are used by a very narrow set of applications. Also included in the unimplemented set are 33 tests that use the /dev/loop device which is not something that we support yet.
We’ve also re-categorized what it means to be a skipped test. Now every test in the skipped category is also skipped on native 64 bit Ubuntu. Most these tests are for the 16 bit versions of the system calls.
The below results are using LTP version 20160510.
|Pass % (not including unimplemented)||88.88%|
|Pass % (including unimplemented)||73.81%|
Of the failing filesystem tests the majority are due to missing support for the rt_sigqueueinfo system call.
Unfortunately, LTP is not a one stop shop. Even if we could achieve 100% pass rate for applicable tests it wouldn't tell us everything. There are many system calls (primarily newer ones) that do not have coverage in LTP. Most LTP tests are rudimentary and don’t test interesting race conditions or the full range of arguments a system call supports. A very large percentage of LTP tests are negative variation tests that do little more than ensure that a given system call returns the correct error code when invalid input is provided. In these cases, the WSL team covers test gaps by writing our own unit tests. At time of writing the team has written over 150,000 lines of unit test code for systems calls and virtual files (/proc, /sys).
Importantly, 100% LTP pass rate is not the gold standard. Many system calls that LTP covers are rarely used in practice and won’t affect the majority of users. Which brings us to scenario testing.
To improve the coverage of our testing we needed something else that stressed areas used by our scenarios of focus. For this, we turned to the unit tests that various open source projects use to validate their own product. We found that many of the frameworks we care about have extensive suites of unit tests. Interestingly, there are even a small number of tests that pass on WSL that did not pass on native Ubuntu 16.04.
|Test Name||Version||WSL Failing Tests||WSL Pass %||Native||Native Pass %||Total Tests||Notes|
|PIP (python 2.7)||Master||3||99.57%||3||99.57%||700||11 skipped on both|
|Bower||V1.8(master)||0||100.00%||0||100.00%||539||17 skipped on both|
|Rake||Master||0||100.00%||0||100.00%||573||1 skipped on WSL|
|JUNIT||Master||0||100.00%||0||100.00%||985||4 skipped on both|
While these tests give us a good idea of what's working they don't tell the whole story. Just because a framework’s unit tests pass at 100% does not guarantee that a user will never hit a bug in that framework.
In the future, we will continue to build out the infrastructure for running these tests in a more automated fashion. Additionally, we will add more tests to this list as we identify top frameworks and tools. If you don't see your favorite on here, don't worry! We'll get there.
Stress Testing and Fuzzing
Another important part of the WSL testing story is stress and fuzz testing. We have leveraged Trinity for system call fuzz testing and have incorporated the test into our automated test process. For stress testing, we have relied on stress-ng. Both tools have helped us find critical bugs in hard-to-reach areas of our driver.
The final piece of our approach to improving WSL has been our GitHub and our incredible Windows Insider users. The tens of thousands of users previewing the Creators Update over the last 6 months have generated nearly 2000 issues that have been filed on everything from network connection problems to requests for more color support in the console. We look at every issue that is filed and make sure that we have a plan to address those that are actionable and applicable. This has helped us stay closely in sync with our community occasionally responding with fixes in a matter of weeks. If you are interested in receiving the latest WSL bits months before they are publicly available be sure to join the Windows Insider program.
We are thrilled with the progress that WSL has made in the past year. We also see a path forward to equally impactful features and fixes that will increase our compatibility and make WSL an even more compelling development tool. But we can’t do it without you! Continue to give us feedback on where you would like us to focus our efforts. Try your development stack on WSL. Let us know when you hit problems and help us to understand your scenarios. Join the Windows Insider program and file issues on our Github.
Thank you for supporting us with your usage and your bugs. We’re excited about the future of WSL. Stay tuned!