Testing the Windows Subsystem for Linux
Following the release of Windows Subsystem for Linux, the development team has created and released a series of technical deep dive blog posts. For a better understanding of the foundations of WSL check out the WSL Overview post.
Thank you to Ben Hillis, Vikas Agrawal, and Sunil Muthuswamy for providing content and review.
Overview
The goal of Windows Subsystem for Linux (WSL) is to provide a command line developer environment identical to those on *nix systems without requiring a virtual machine. This primarily consists of access to command line utilities, native ELF64 binary execution, and support common developer frameworks like MEAN and LAMP web server stacks. Additionally, to make the system feel as native as possible, WSL exposes both the Windows file system and networking stack automatically. All of this is accomplished via a kernel level emulation layer that translates system calls from Linux to their corresponding NT counterpart. Where there is no direct NT counterpart WSL fills those gaps.
WSL was initially released one year ago at //Build 2016 to an enthusiastic group of users. Since then, our active user count has continued to climb. At time of writing, WSL is used by over half a million people each month. With a community of this size we wanted to be transparent about what sort of quality should be expected. Shipping WSL with a beta tag was a no-brainer for us. We knew that the subsystem wasn't in a place where it would be suited for a production environment and wanted to be clear about what user's expectations should be. To us, the beta tag conveyed the message, "Some things will break and there will be gaps."
As development progressed, we have looked for ways to quantify WSL's shortcomings and lay out an implementation road map for the missing features needed to reach parity with bare metal Linux. These efforts have focused on four main pillars: the Linux Test Project, scenario based testing, fuzzing, and feedback from our community.
LTP
An important tool in improving our compatibility has been the Linux Test Project (LTP). LTP is a jointly created repository of unit tests designed for validating the system call and file system interfaces of the Linux kernel. LTP has been invaluable in helping us verify the robustness and stability of our implementation by providing a standardized regression check as well as a road map for future development. LTP has told us a great deal about the completeness of WSL's system call implementation. We also publish the results of LTP on our Github.
LTP in the Anniversary Update
When the Anniversary Update shipped, our LTP pass rate was decent, but not great. We started using LTP relatively late in the development cycle and that didn’t leave much time to tune our system call implementations to improve the pass rate. The results below are using LTP version 20150420.
System Calls
Passing | 637 |
Failing | 280 |
Skipped | 144 |
Total | 1061 |
Pass % (passing/(total - skipped)) | 69.45% |
Filesystem
Passing | 20 |
Failing | 41 |
Total | 61 |
Pass % | 32.8% |
LTP in the Creators Update
In the Creators Update WSL reports Linux kernel version 4.4. We have bumped our LTP version to 20160510 to match. In the below table, you’ll notice there’s an additional row not present in the LTP results from the Anniversary Update. The unimplemented designation means that the test is failing, but the system call is not implemented in WSL. This distinction is important to us because we implement system calls as-needed based on user input and telemetry. Many of the system calls that we have not implemented are not used or are used by a very narrow set of applications. Also included in the unimplemented set are 33 tests that use the /dev/loop device which is not something that we support yet.
We’ve also re-categorized what it means to be a skipped test. Now every test in the skipped category is also skipped on native 64 bit Ubuntu. Most these tests are for the 16 bit versions of the system calls.
The below results are using LTP version 20160510.
System Calls
Passing | 744 |
Failing | 93 |
Unimplemented | 171 |
Skipped | 102 |
Total | 1110 |
Pass % (not including unimplemented) | 88.88% |
Pass % (including unimplemented) | 73.81% |
Filesystem
Of the failing filesystem tests the majority are due to missing support for the rt_sigqueueinfo system call.
Passing | 52 |
Failing | 9 |
Total | 61 |
Pass % | 85.24% |
LTP Shortcomings
Unfortunately, LTP is not a one stop shop. Even if we could achieve 100% pass rate for applicable tests it wouldn't tell us everything. There are many system calls (primarily newer ones) that do not have coverage in LTP. Most LTP tests are rudimentary and don’t test interesting race conditions or the full range of arguments a system call supports. A very large percentage of LTP tests are negative variation tests that do little more than ensure that a given system call returns the correct error code when invalid input is provided. In these cases, the WSL team covers test gaps by writing our own unit tests. At time of writing the team has written over 150,000 lines of unit test code for systems calls and virtual files (/proc, /sys).
Importantly, 100% LTP pass rate is not the gold standard. Many system calls that LTP covers are rarely used in practice and won’t affect the majority of users. Which brings us to scenario testing.
Scenario Testing
To improve the coverage of our testing we needed something else that stressed areas used by our scenarios of focus. For this, we turned to the unit tests that various open source projects use to validate their own product. We found that many of the frameworks we care about have extensive suites of unit tests. Interestingly, there are even a small number of tests that pass on WSL that did not pass on native Ubuntu 16.04.
Test Name | Version | WSL Failing Tests | WSL Pass % | Native | Native Pass % | Total Tests | Notes |
Nginx | Nginx 1.4.6 | 0 | 100.00% | 0 | 100.00% | 99 | |
Django | 1.10x(master) | 4 | 99.97% | 4 | 99.97% | 11776 | |
Flask | 0.11(master) | 1 | 99.69% | 1 | 99.69% | 327 | |
PIP (python 2.7) | Master | 3 | 99.57% | 3 | 99.57% | 700 | 11 skipped on both |
Grunt | Master | 0 | 100.00% | 0 | 100.00% | 390 | |
Gulp | Master | 0 | 100.00% | 0 | 100.00% | 31 | |
Express.js | 4.x(master) | 0 | 100.00% | 0 | 100.00% | 799 | |
Bower | V1.8(master) | 0 | 100.00% | 0 | 100.00% | 539 | 17 skipped on both |
Json-server | Master | 0 | 100.00% | 0 | 100.00% | 77 | |
Coffescript | Master | 0 | 100.00% | 1 | 99.88% | 822 | |
Ember.js | Master | 0 | 100.00% | 0 | 100.00% | 20642 | |
Typescript | Master | 0 | 100.00% | 0 | 100.00% | 52976 | |
NVM | Master | 1 | 99.01% | 1 | 99.01% | 101 | |
Phantom.js | Master | 12 | 94.50% | 12 | 94.50% | 218 | |
Rails | Rails 5.0.0.1 | 0 | 100.00% | 2 | 99.99% | 14056 | |
Rake | Master | 0 | 100.00% | 0 | 100.00% | 573 | 1 skipped on WSL |
RVM | V1.27.0 | 37 | 93.03% | 37 | 93.03% | 531 | |
Sinatra | Master | 0 | 100.00% | 0 | 100.00% | 2365 | |
Sinatra | Stable | 0 | 100.00% | 0 | 100.00% | 2354 | |
JUNIT | Master | 0 | 100.00% | 0 | 100.00% | 985 | 4 skipped on both |
MAVEN | Master | 0 | 100.00% | 0 | 100.00% | 494 | |
STRUTS | Master | 0 | 100.00% | 0 | 100.00% | 317 | |
R | R-3.3.2 | 0 | 100.00% | 0 | 100.00% | 1088 | |
PostgreSQL | postgresql-9.5.3 | 0 | 100.00% | 0 | 100.00% | 157 | |
Cassandra | Master | 0 | 100.00% | 12 | 96.20% | 316 |
While these tests give us a good idea of what's working they don't tell the whole story. Just because a framework’s unit tests pass at 100% does not guarantee that a user will never hit a bug in that framework.
In the future, we will continue to build out the infrastructure for running these tests in a more automated fashion. Additionally, we will add more tests to this list as we identify top frameworks and tools. If you don't see your favorite on here, don't worry! We'll get there.
Stress Testing and Fuzzing
Another important part of the WSL testing story is stress and fuzz testing. We have leveraged Trinity for system call fuzz testing and have incorporated the test into our automated test process. For stress testing, we have relied on stress-ng. Both tools have helped us find critical bugs in hard-to-reach areas of our driver.
Community/GitHub
The final piece of our approach to improving WSL has been our GitHub and our incredible Windows Insider users. The tens of thousands of users previewing the Creators Update over the last 6 months have generated nearly 2000 issues that have been filed on everything from network connection problems to requests for more color support in the console. We look at every issue that is filed and make sure that we have a plan to address those that are actionable and applicable. This has helped us stay closely in sync with our community occasionally responding with fixes in a matter of weeks. If you are interested in receiving the latest WSL bits months before they are publicly available be sure to join the Windows Insider program.
Conclusion
We are thrilled with the progress that WSL has made in the past year. We also see a path forward to equally impactful features and fixes that will increase our compatibility and make WSL an even more compelling development tool. But we can’t do it without you! Continue to give us feedback on where you would like us to focus our efforts. Try your development stack on WSL. Let us know when you hit problems and help us to understand your scenarios. Join the Windows Insider program and file issues on our Github.
Thank you for supporting us with your usage and your bugs. We’re excited about the future of WSL. Stay tuned!
Comments
- Anonymous
April 11, 2017
Thank you, Jack. I would like to personally thank the developers of the above test tools that we have used and the open source community at large, whose contributions have had a huge impact (and continues to) in helping MSFT deliver quality product. - Anonymous
April 11, 2017
Hi There,What can be done to help pass Django ? :)- Anonymous
April 12, 2017
You can try to run the tests and check exactly what fails.
- Anonymous
- Anonymous
April 11, 2017
What could Perl do to get added to the test suite? - Anonymous
April 12, 2017
Why don't you just run a VM ? - Anonymous
April 12, 2017
Could you guys contribute your new unit tests back to LTP? Or even publish them separately ? Linux NEEDS more testing.- Anonymous
April 12, 2017
More test contributions would indeed be appreciated by the community. - Anonymous
April 25, 2017
That is a great suggestion, contributing variations back to LTP is something that we should look into.
- Anonymous
- Anonymous
April 12, 2017
The improvements that have been made recently in the WSL are spectacular. If anything, this will make Windows a better operating system to develop on.I was wondering if there was any plan/hope/possibility of leveraging your own test suite for the Linux kernel development (maybe working directly with the LTP)? This could help to ensure compatibility, and also confirm the test suite correctness (though I believe you are probably testing some parts of it against Vanilla Linux). - Anonymous
April 12, 2017
Lovely stuff Jack. Thanks for the writeup. - Anonymous
April 12, 2017
I'm surprised you are not testing Mono and .NET Core, is this intentionally not supported?- Anonymous
April 13, 2017
Nope, we do not intentionally withhold support for anything. We have actually tested .NET extensively.
- Anonymous
- Anonymous
April 14, 2017
Hey there, is there any news when we are going to use Full version of Windows Subsystem Linux, it's still in beta.Also i would like to know when we will get feature like interface customization. Now i have customized my terminal with bashrc, but it's not working as it should, there i can use maximum 7 colors. - Anonymous
April 14, 2017
Great, I would just add Wine as a very interesting candidate for Scenario testing. - Anonymous
April 14, 2017
Nice blog post. A worthwhile test to add to your list is libuv, which underpins NodeJS. It currently has 11 fails as of this writing. Passes clean on Real Linux™.https://github.com/libuv/libuv - Anonymous
April 19, 2017
Thanks for this awesome project! It can allow me to ship my software packages via yum also to Windows :)Where can I find which syscalls (with which arguments) are supported? I'm interested especially in futex syscall bacause of https://github.com/Microsoft/BashOnWindows/issues/1955- Anonymous
April 25, 2017
Good question, probably the best place to look would be the LTP results we publish with each release.
- Anonymous
- Anonymous
May 03, 2017
It's very gratifying to find out that you are using my stress-ng tool for the stress testing. Let me know if there is anything we can do to help with this stress work. See http://kernel.ubuntu.com/~cking/stress-ng/- Anonymous
May 11, 2017
Yes, thanks Colin. The tool has been very handy to us as part of our test arsenal. We run them regularly. The options to exclude providers, randomize them have all been very useful. We appreciate your work. Thanks for reaching out. If we find opportunities to collaborate, we will reach out through your GitHub page.
- Anonymous
- Anonymous
May 21, 2017
As a complement to the LTP, you should also try the test suite of GNU gnulib. It tests against hundreds of possible bugs in the libc and kernel.