Citiți în limba engleză

Partajați prin


WSL File System Support

This is the fourth in a series of blog posts on the Windows Subsystem for Linux (WSL). For background information you may want to read the architectural overview, introduction to pico processes and WSL system calls blog posts.

Posted on behalf of Sven Groot.

Introduction

One of the key goals for the Windows Subsystem for Linux is to allow users to work with their files as they would on Linux, while giving full interoperability with files the user already has on their Windows machine. Unlike a virtual machine, where you have to use network shares or other solutions to share files between the host and guest OS, WSL has direct access to all your Windows drives to allow for easy interop.

Windows file systems differ substantially from Linux file systems, and this post looks into how WSL bridges those two worlds.

File systems on Linux

Linux abstracts file systems operations through the Virtual File System (VFS), which provides both an interface for user mode programs to interact with the file system (through system calls such as open, read, chmod, stat, etc.) and an interface that file systems have to implement. This allows multiple file systems to coexist, providing the same operations and semantics, with VFS giving a single namespace view of all these file systems to the user.

File systems are mounted on different directories in this namespace. For example, on a typical Linux system your hard drive may be mounted at the root, /, with directories such as /dev, /proc, /sys, and /mnt/cdrom all mounting different file systems which may be on different devices. Examples of file systems used on Linux include ext4, rfs, FAT, and others.

VFS implements the various system calls for file system operations by using a number of data structures such as inodes, directory entries and files, and related callbacks that file systems must implement.

Inodes

The inode is the central data structure used in VFS. It represents a file system object such as a regular file, directory, symbolic link, etc. An inode contains information about the file type, size, permissions, last modified time, and other attributes. For many common Linux disk file systems such as ext4, the on-disk data structures used to represent file metadata directly correspond to the inode structure used by the Linux kernel.

While an inode represents a file, it does not represent a file name. A single file may have multiple names, or hard links, but only one inode.

File systems provide a lookup callback to VFS which is used to retrieve an inode for a particular file, based on the parent inode and the child name. File systems must implement a number of other inode operations such as chmod, stat, open, etc.

Directory entries

VFS uses a directory entry cache to represent your file system namespace. Directory entries only exist in memory, and contain a pointer to the inode for the file. For example, if you have a path like /home/user/foo, there is a directory entry for home, user, and foo, each with a pointer to an inode. Directory entries are cached for fast lookup, but if an entry is not yet in the cache, the inode lookup operation is used to retrieve the inode from the file system so a new directory entry can be created.

File objects

When an inode is opened, a file object is created for that file which keeps track of things like the file offset and whether the file was opened for read, write or both. File systems must provide a number of file operations such as read, write, sync, etc.

File descriptors

Applications refer to file objects through file descriptors. These are numeric values, unique to a process, that refer to any files the process has open. File descriptors can refer to other types of objects that provide a file-like interface in Linux, including ttys, sockets, and pipes. Multiple file descriptors can refer to the same file object, e.g. through use of the dup system call.

Special file types

Besides just regular files and directories, Linux supports a number of additional file types. These include device files, FIFOs, sockets, and symbolic links.

Some of these files affect how paths are parsed. Symbolic links are special files that refer to a different file or directory, and following them is handled seamlessly by VFS. If you open the path /foo/bar/baz and bar is a symbolic link to /zed, then you will actually open /zed/baz instead.

Similarly, a directory may be used as a mount point for another file system. In this case, when a path crosses this directory, all inode operations below the mount point go to the new file system.

Special and pseudo file systems

Linux uses a number of file systems that don’t read files from a disk. TmpFs is used as a temporary, in-memory file system, whose contents will not be persisted. ProcFs and SysFs both provide access to kernel information about processes, devices and drivers. These file systems do not have a disk, network or other device associated with them, and instead are virtualized by the kernel.

File systems on Windows

Windows generalizes all system resources into objects. These include not just files, but also things like threads, shared memory sections, and timers, just to name a few. All requests to open a file ultimately go through the Object Manager in the NT kernel, which routes the request through the I/O Manager to the correct file system driver. The interface that file system drivers implement in Windows is more generic and enforces fewer requirements. For example, there is no common inode structure or anything similar, nor is there a directory entry; instead, file system drivers such as ntfs.sys are responsible for resolving paths and opening file objects.

File systems in Windows are typically mounted on drive letters like C:, D:, etc., although they can be mounted on directories in other file systems as well. These drive letters are actually a construct of Win32, and not something that the Object Manager directly deals with. The Object Manager keeps a namespace that looks similar to the Linux file system namespace, rooted in \, with file system volumes represented by device objects with paths like \Device\HarddiskVolume1.

When you open a file using a path like C:\foo\bar, the Win32 CreateFile call translates this to an NT path of the form \DosDevice\C:\foo\bar, where \DosDevice\C: is actually a symbolic link to, for example, \Device\HarddiskVolume4. Therefore, the real full path to the file is actually \Device\HarddiskVolume4\foo\bar. The object manager resolves each component of the path, similar to how VFS would in Linux, until it encounters the device object. At this point, it forwards the request to the I/O manager, which creates an I/O Request Packet (IRP) with the remaining path, which it sends to the file system driver for the device.

File objects

When a file is opened, the object manager creates a file object for it. Instead of file descriptors, the object manager provides handles to file objects. Handles can actually refer to any object manager object, not just files.

When you call a system call like NtReadFile (typically through the Win32 ReadFile function), the I/O manager again creates an IRP to send down to the file system driver for the file object to perform the request.

Because there are no inodes or anything similar in NT, most operations on files in Windows require a file object.

Reparse points

Windows only supports two file types: regular files and directories. Both files and directories can be reparse points, which are special files that have a fixed header and a block of arbitrary data. The header includes a tag that identifies the type of reparse point, which must be handled by a file system filter driver, or for built-in reparse point types, the I/O manager itself.

Reparse points are used to implement symbolic links and mount points. In these cases, the tag indicates that the reparse point is a symbolic link or mount, and the data associated with the reparse point contains the link target, or volume name for mount points. Reparse points can also be used for other functionality such as the placeholder files used by OneDrive in Windows 8.

Case sensitivity

Unlike Linux, Windows file systems are by default case preserving, but not case sensitive. In actuality, Windows and NTFS do support case sensitivity, but this behavior is not enabled by default.

File systems in WSL

The Windows Subsystem for Linux must translate various Linux file system operations into NT kernel operations. WSL must provide a place where Linux system files can exist, with all the functionality required for that including Linux permissions, symbolic links and other special files such as FIFOs; it must provide access to the Windows volumes on your system; and it must provide special file systems such as ProcFs.

To facilitate this, WSL has a VFS component that is modeled after the VFS on Linux. The overall architecture is shown below.

file system graphic

When an application calls a system call, this is handled by the system call layer, which defines the various kernel entry points such as open, read, chmod, stat, etc. For these file-related system calls, the system call layer has very little functionality; it basically just forwards the call to VFS.

For operations that use paths (such as open or stat), VFS resolves the path using a directory entry cache. If an entry is not in the cache, it calls into one of several file system plugins to create an inode for the entry. These plugins provide inode operations like lookup, chmod, and others, similar to the inode operations used by the Linux kernel. When a file is opened, VFS uses the file system’s inode open operation to create a file object, and returns a file descriptor for that file object. System calls operating on the file descriptor (such as read, write or sync) call file operations defined by the file systems. This system is deliberately very close to how Linux behaves, so WSL can support the same semantics.

VFS defines several file system plugins: VolFs and DrvFs are used to represent files on disk, and the remainder are the in-memory file system TmpFs and pseudo file systems such as ProcFs, SysFs, and CgroupFs.

VolFs and DrvFs are where Linux file systems meet Windows file systems. They are how WSL interacts with files on your disks, and serve two different purposes: VolFs is designed to provide full support for Linux file system features, and DrvFs is designed for interop with Windows.

Let’s look at these file systems in more detail.

VolFs

The primary file system used by WSL is VolFs. It is used to store the Linux system files, as well as the content of your Linux home directory. As such, VolFs supports most features the Linux VFS provides, including Linux permissions, symbolic links, FIFOs, sockets, and device files.

VolFs is used to mount the VFS root directory, using %LocalAppData%\lxss\rootfs as the backing storage. In addition, a few additional VolFs mount points exist, most notably /root and /home which are mounted using %LocalAppData%\lxss\root and %LocalAppData%\lxss\home respectively. The reason for these separate mounts is that when you uninstall WSL, the home directories are not removed by default, so any personal files stored there will be preserved.

Note that all these mount points use directories in your Windows user folder for storage. Each Windows user has their own WSL environment, and can therefore have Linux root privileges and install applications without affecting other Windows users.

Inodes and file objects

Since Windows has no related inode concept, VolFs must keep a handle to a Windows file object in an inode. When VFS requests a new inode using the lookup callback, VolFs uses the handle from the parent inode and the name of the child to perform a relative open and get a handle for the new inode. These handles are opened without any read/write access to the files, and can only be used for metadata requests.

When a file is opened, VolFs creates a Linux file object that points to the inode. It also reopens the inode’s file handle with the requested read/write access and stores the new handle in the file object. This handle is then used to satisfy file operations like read and write.

Emulating Linux features

As discussed above, Linux diverges from Windows in several ways for file systems. VolFs must provide support for several Linux features that are not directly supported by Windows.

Case sensitivity is handled by Windows itself. As mentioned earlier, Windows and NTFS actually support case sensitive operations, so VolFs simply requests the Object Manager to treat paths as case sensitive regardless of the global registry key controlling this behavior.

Linux also supports nearly all characters as legal characters in file names. NT has more restrictions, where some characters are not allowed at all and others may have special meanings (such as ‘:’ denoting an alternate data stream). To support all Linux file names, VolFs escapes illegal characters in file names.

Linux has some different semantics surrounding unlinking and renaming. Specifically, a file can be unlinked even if there are open file descriptors to the file. Similarly, a file can be overwritten as the target of a rename operation even if it’s still open. In Windows, if a file is requested to be deleted, it will only be deleted once the last handle to that file is closed, leaving the name visible in the file system until then. To support Linux unlink semantics, VolFs renames unlinked files to a hidden temporary directory before requesting deletion.

Inodes in Linux have a number of attributes which don’t exist in Windows, including their owner and group, the file mode, and others. These attributes are stored in NTFS Extended Attributes associated with the files on disk. The following information is stored in the Extended Attributes:

  • Mode: this includes the file type (regular, symlink, FIFO, etc.) and the permission bits for the file.
  • Owner: the user ID and group ID of the Linux user and group that own the file.
  • Device ID: for device files, the device major and minor number of the device. Note that WSL currently does not allow users to create device files on VolFs.
  • File times: the file accessed, modified and changed times on Linux use a different format and granularity than on Windows, so these are also stored in the EAs.

In addition, if a file has any file capabilities, these are stored in an alternate data stream for the file. Note that WSL currently does not allow users to modify file capabilities for a file.

The remaining inode attributes, such as inode number and file size, are derived from information kept by NTFS.

Interoperability with Windows

While VolFs files are stored in regular files on Windows in the directories mentioned above, interoperability with Windows is not supported. If a new file is added to one of these directories from Windows, it lacks the EAs needed by VolFs, so VolFs doesn’t know what to do with the file and simply ignores it. Many editors will also strip the EAs when saving an existing file, again making the file unusable in WSL.

Additionally, since VFS caches directory entries, any modifications to those directories that are made from Windows while WSL is running may not be accurately reflected.

DrvFs

To facilitate interoperability with Windows, WSL uses the DrvFs file system. WSL automatically mounts all fixed drives with supported file systems under /mnt, such as /mnt/c, /mnt/d, etc. Currently, only NTFS and ReFS volumes are supported.

DrvFs operates in a similar fashion as VolFs. When creating inodes and file objects, handles are opened to Windows files. However, in contrast to VolFs, DrvFs adheres to Windows rules (with a few exceptions, noted below). Windows permissions are used, only legal NTFS file names are allowed, and special file types such as FIFOs and sockets are not supported.

DrvFs permissions

Linux usually uses a simple permission model where a file allows read, write or execute access to either the owner of the file, the group, or everyone else. Windows instead uses Access Control Lists (ACLs) that specify complex access rules for each individual file and directory (Linux does also have the ability to use ACLs, but this is not currently supported in WSL).

When opening a file in DrvFs, Windows permissions are used based on the token of the user that executed bash.exe. So in order to access files under C:\Windows, it’s not enough to use “sudo” in your bash environment, which gives you root privileges in WSL but does not alter your Windows user token. Instead, you would have to launch bash.exe elevated to gain the appropriate permissions.

In order to give the user a hint about the permissions they have on files, DrvFs checks the effective permissions a user has on a file and converts those to read/write/execute bits, which can be seen for example when running “ls -l”. However, there is not always a one-to-one mapping; for example, Windows has separate permissions for the ability to create files or subdirectories in a directory. If the user has either of these permissions, DrvFs will report write access on the directory, while in fact some operations may still fail with access denied.

Since your effective access to a file may differ depending on whether bash.exe was launched elevated or not, the file permissions shown in DrvFs will also change when switching between elevated and non-elevated bash instances.

When calculating the effective access to a file, DrvFs takes the read-only attribute into account. A file with the read-only attribute set in Windows will show up in WSL as not having write permissions. Chmod can be used to set the read-only attribute (by removing all write permissions, e.g. “chmod a-w some_file”) or clear it (by adding any write permissions, e.g. “chmod u+w some_file”). This behavior is similar to the CIFS file system in Linux, which is used to access Windows SMB shares.

Case sensitivity

Since the support is there in Windows and NTFS, DrvFs supports case sensitive files. This means it’s possible to create two files whose name only differs by case in DrvFs. Note that many Windows applications may not be able to handle this situation, and may not be able to open one or both of the files.

Case sensitivity is disabled on the root of your volumes, but is enabled everywhere else. So in order to use case sensitive files, do not attempt to create them under /mnt/c, but instead create a directory where you can create the files.

While NT supports symbolic links, we could not rely on this support because symbolic links created by WSL may point to paths like /proc which have no meaning in Windows. Additionally, NT requires administrator privileges to create symbolic links. So, another solution had to be found.

Unlike VolFs, we could not rely on EAs to indicate a file is a symbolic link in DrvFs. Instead, WSL uses a new type of reparse point to represent symbolic links. As a result, these links will work only inside WSL and cannot be resolved by other Windows components such as File Explorer or cmd.exe. Note that since ReFS lacks support for reparse points, it also doesn’t support symbolic links in WSL. NTFS however now has full symbolic link support in WSL.

Interoperability with Windows

Unlike VolFs, DrvFs does not store any additional information. Instead, all inode attributes are derived from information used in NT, by querying file attributes, effective permissions, and other information. DrvFs also disables directory entry caching to ensure it always presents the correct, up-to-date information even if a Windows process has modified the contents of a directory. As such, there is no restriction on what Windows processes can do with the files while DrvFs is operating on them.

DrvFs also uses Windows delete semantics for files, so a file cannot be unlinked if there are any open file descriptors (or handles from Windows processes) to the file.

ProcFs and SysFs

Like in Linux, these special file systems do not show files that exist on disk, but instead represent information kept by the kernel about processes, threads, and devices. These files are dynamically generated when read. In some cases, the information for the files is kept entirely inside the lxcore.sys driver. In other cases, such as the CPU usage of a process, WSL queries the NT kernel for this information. However, there is no interaction here with Windows file systems.

Conclusion

WSL provides access to Windows files by emulating full Linux behavior for the internal Linux file system with VolFs, and by providing full access to Windows drives and files through DrvFs. As of this writing, DrvFs enables some of the functionality of Linux file systems, such as case sensitivity and symbolic links, while still supporting interoperability with Windows.

In the future, we will continue to improve our support for Linux file system features, not only in VolFs but also in DrvFs. The goal is to reduce the number of scenarios that require you to stay in the VolFs mounts with all the limitations on interoperability that entails. These improvements are driven by the great feedback we get from the community on GitHub and User Voice to help us target the most important scenarios.

 

Sven Groot and Seth Juarez explore WSL file system support.

Comments

  • Anonymous
    June 15, 2016
    Inodes in Linux have a number of attributes which don’t exist >in Linux In Windows?
  • Anonymous
    June 15, 2016
    Previous comment is broken.Inodes in Linux have a number of attributes which don’t exist in Linux...In Windows?
    • Anonymous
      August 14, 2016
      The original comment actually meant :Inodes in Linux have a number of attributes which don’t exist in Windows Subsystem for Linux.
  • Anonymous
    June 16, 2016
    How do you handle Unicode characters in paths, both in VolFS and DrvFS? utf-8 on the WSL side, hopefully? What happens when you hit a path that is not valid utf-16 on the NT side or a path that is not valid utf-8 (or whatever is used) on the WSL side?Does DrvFS see NT symbolic links and junction as symbolic links? What about NT mount points?
    • Anonymous
      June 16, 2016
      The comment has been removed
      • Anonymous
        July 01, 2016
        Thanks!
  • Anonymous
    June 16, 2016
    Another great article!Possible Typo:"Inodes in Linux have a number of attributes which don’t exist in Linux"
  • Anonymous
    June 17, 2016
    Two suggestions:Use interoperable NTFS symlinks in DrvFS when possiblehttps://wpdev.uservoice.com/forums/266908-command-prompt-console-bash-on-ubuntu-on-windo/suggestions/14840844-use-interoperable-ntfs-symlinks-in-drvfs-when-possAdd symlinks for standard %USERPROFILE% subdirectories to $HOMEhttps://wpdev.uservoice.com/forums/266908-command-prompt-console-bash-on-ubuntu-on-windo/suggestions/14840448-add-symlinks-for-standard-userprofile-subdirecto
  • Anonymous
    June 23, 2016
    By the way, do you have plans to support fuse filesystems? What about NFSv4 (iirc the Windows NFS client does not support v4, only the server does) and NFSv4 ACLs?
  • Anonymous
    July 19, 2016
    Realy Great Article :-)
  • Anonymous
    July 22, 2016
    inotify not work in directories .
  • Anonymous
    July 24, 2016
    Please update is on inotify status. Github://Microsoft/bashonwindows/issues/216
  • Anonymous
    July 25, 2016
    Where is the WSL File-system stored within windows? I'd like to make sure I am capturing backups of it periodically.
    • Anonymous
      August 12, 2016
      %USERPROFILE%\AppData\Local\lxss\rootfs
  • Anonymous
    July 25, 2016
    While I haven't used the WSL, yet, I do have a question about development and portability. For the thought process I'm in, I'll preface to say that since Windows 8, we've had the ability to mount VHD(X) files as a logical device. Would it be possible to expand the VOIFs and DrvFs support for using a VHD(X) as the base file system (either as just a configured NTFS volume for WSL to use or even more directly as a direct device within the pico processes emulating the Linux Kernel)Ideally, this would make a WSL subsystem configuration transportable, similar to a client OS in a HyperV environment. This would help with development boxes where tools, makefiles, and other errata on a project - or perhaps as a core utilities distributable set within an organization - which could be easily transported.
  • Anonymous
    August 17, 2016
    I'm quite excited having played around a bit with WSL after the anniversary update of Windows 10. But I discovered one issue. In the Windows Disk Management I have mounted one hard drive as a subdirectory on the C: drive. But I cannot access let alone browse this directory from bash on Windows. Are there plans to address this problem?
  • Anonymous
    August 31, 2016
    i can't activate bash on windows 14393.0 32 bits does windows bash support 32 bit windows ? if doesn't please support it i want running bash on windows 32 bit
  • Anonymous
    August 31, 2016
    How are you handling the MAX_PATH limitation of 260 chars as imposed by the Win32 API? NPM (Node Package Manager) in particular tends to create folder hierarchies that exceed this limitation, and if this isn't resolved here, it would preclude the usage of WSL for many popular repositories.
    • Anonymous
      August 31, 2016
      Additionally, it appears that Git has issues when working on shared directories, which is how the windows filesystem is represented. Do you have any proposed solutions to that problem as well?
    • Anonymous
      December 29, 2016
      The upper limit on Windows NT is 32767, not 260 -- has been for a long time, you just have to know how to ask. Just prefix the drive letter with \?\ -- for more information https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspxThat was done so that an app does not accidentally get a > 260 character path.On Windows 10, you can enable longer paths in the core API's system wide via Group Policy. You can also enable them in individual applications via the application manifest file -- instructions in the article.  The catch to that is, you may have apps that are allocating 260 character buffers -- so when you copy 10000 characters into that buffer, you're overwriting who knows what. App could crash, it could create security issues, etc. And so best to enable it only if you control both the app and any extensions that might make assumptions about the buffer size.
  • Anonymous
    September 11, 2016
    Why does WSL not recognize the JDK installation done with a Windows Application Installer (taken from Oracle Java website) ?
    • Anonymous
      November 03, 2016
      WSL applications can not directly interact with true win32 apps ( a windows installed JVM is win 32).
  • Anonymous
    September 28, 2016
    There might be some interoperability issues with MKS. I have an old copy of this toolkit which I use for basic unix style command lines tools such as ls, awk and grep. But my real reason is the vi editor. I do a lot of manuscript preparation and want the convenience of vi. I know this is also available through gvim (and others). But the main point is that after going through the installation steps, the bash command does not bring up a license. I might remove the MKS toolkit and try again.
  • Anonymous
    November 30, 2016
    I need to be able to access USB devices. Example: /dev/ttyUSB0 etc.When will this be supported? How do I get to know of updates? BTW, how do I check my current installed version?
  • Anonymous
    May 01, 2017
    The comment has been removed
  • Anonymous
    October 19, 2017
    The comment has been removed
    • Anonymous
      October 19, 2017
      I think I figured out what is going on. In my Cygwin world, in /etc/password, my home dir is /home/kbotts. But /home/kbotts is a symlink to c:/kbotts (it appears in cygwin syntax, that is, /c/kbotts, but that is a detail). It is a native Windows symlink, which cygwin understands, but I think WSL does not. The HOME envar contains c:/kbotts. The login process emulator in cygwin1.dll makes the translation, and my login shells start in /c/kbotts. Under WSL, there is another /home/kbotts, which masks the cygwin dir, even for a cygwin bash. Thus, if WSL is installed, even the cygwin bash starts in the WSL /home/kbotts.I supposed I could copy everything under the cygwin /home/kbotts to the WSL /home/kbotts. That is not so easy, for several reasons, both because it is managed in revision control, and because there is at least some cygwin specific stuff in some of the files there. (It contains a lot more than just the shell startup scripts.) I do have the truly portable stuff factored out, so that part I share in common with my environment on a native Linux host, kept in sync via revision control. But, I cannot use Windows tools (including Cygwin tools, which actually are native Windows tools) to do anything to WSL files, right? So if I move all these files to where WSL can access them, I can no longer access them from cygwin or Windows cmd.exe, right? So I gotta have two copies on the same host, one of which I can only access using WSL tools, right? Ugh.I know, you think this is not your problem: the problem is that I use cygwin. But consider: many members of the target audience for WSL have been using cygwin for years. I think it worth your while to help us be able to use WSL, without having to immediately give up Cygwin entirely. Make sense?