System.IO in .NET Core 2.1 sneak peek

Now that .NET Core 2.1 Preview 1 has released it is time for a preview of what you can expect to see in Preview 2 and beyond in the System.IO namespace. The 2.1 release changes are the largest changes in quite some time.

Goals

Cross platform

Code should run consistently as possible between platforms. You should also be able to successfully write code that works in mixed environments. Accessing a Unix volume from Windows should just work, for example.

Light touch

System.IO's aggressive handholding blocked a number of scenarios, including mixed environments as described above. Historically this was driven partially due to Code Access Security (CAS) in NetFX (e.g. 4.x). It was also a more plausible approach when .NET was a Windows only solution to aggressively attempt to predict OS behavior. .NET judging the validity of paths is the key example of this well-intentioned, but heavy hand. We now strive to not replicate or predict the outcome of OS API calls.

Performant

This is always important to us. We want to put as little overhead on your platform and enable building performant solutions.

Flexible

We don't want to have an API for every single conceivable scenario, but neither do we want to make it impossible/difficult to build solutions. Part of addressing this is just being flexible and forward-thinking in the API set in general. Another part of this can be addressed by providing extension points that allow you to build complicated solutions in a performant way without resorting to P/Invoking directly into the underlying platform.

Overview

What are you getting from this release? Here are the highlights:

  1. More consistent behavior cross platform
  2. Fixed edge cases, particularly on Windows
  3. New Span<char> APIs in System.IO.Path
  4. Path.Join() APIs that don't have root checking behavior like Path.Combine()
  5. Path.GetFullPath() overload that avoids using the current working directory
  6. A number of new directory enumeration options
    1. Filtering out specific attributes
    2. Simple matching behavior
    3. Ignoring inaccessible files
    4. Specifying buffer size (notably for Windows UNC scenarios)
  7. A low-level enumeration extensibility API
  8. Significantly faster directory enumeration
    1. Typically 2-4x faster on Windows (Unix 1.3x - 1.4x)
    2. Significantly lower allocation counts (2x - 40x lower, GC collections cut 8x+)
  9. Faster Path APIs (GetFullPath() is now 2x as fast as .NET Core 2.0)

Key behavior changes

We've changed behavior to fix issues. Here are the key impacts:

  • Exceptions for "bad" paths are thrown when used, not when normalizing (as we can't know what is "valid" without using them)
  • Cross plat scenarios are no longer blocked (mounting Unix volumes/shares on Windows for one example)
  • Performance is significantly increased
  • Behavior is more consistent when running code across platforms

In more detail this means:

  • We've made matching consistent with legacy behavior on non-Windows platforms (????.txt matches the same on all platforms)
  • Match expressions no longer match 8.3 file names on Windows (*.htm will match only *.htm, not *.html, *.htmz if the volume 8.3 generation on)
  • We've fixed enumeration of unusual Windows files - you can now get *Info classes out successfully and use the methods on files that end in spaces, periods
  • We do very little validation of paths up front as there is very little we can accurately predict (unblocking numerous xplat and some existing Windows scenarios)
  • We only check for embedded nulls, no other chars are rejected, including wildcards (as nulls are never supported and OS APIs almost universally take null terminated strings)
    • We don't check for "proper" colon placement
    • We don't check for segment length or total path length
    • We don't check for "proper" UNCs
    • We still throw for null or empty paths on all platforms or paths of all spaces on Windows (as they throw from the Win32 GetFullPathName)
  • We don't trim leading spaces on any paths on Windows anymore (we did for some, not others)
  • We don't trim whitespace characters from the end of paths on Windows (such as nbsp)
  • GetPathRoot, GetDirectoryName, etc. don't throw for empty anymore, they return null (like they do for null).
  • GetPathRoot now works consistently with various Windows prefixes (e.g. \\?\, \\.\)

New System.IO.Path APIs

The majority of what we have here are new ReadOnlySpan<char> overloads, which allow you to avoid unnecessary string allocations. Spans are a key new feature in .NET 2.1. You can pass strings as spans using the .AsSpan() extension (there is an implicit conversion as well). ReadOnlySpan<char> has a number of extensions that allow you to evaluate it as you would a string.

Join() APIs allow putting path segments together without analysis of segment rooting. Combine("C:\Foo\", "\Bar") gives you "\Bar". Join gives you "C:\Foo\Bar".

Path.GetFullPath(path, basePath) is an important addition. Normalizing paths that aren't fully qualified (e.g. relative to the current directory) is dangerous and discouraged. The reason is that the current directory is a process-wide setting. Getting into a state where a separate thread unexpectedly changes the working directory is common enough that you should make every attempt to not use the existing GetFullPath().

New EnumerationOptions

There have been numerous asks for directory enumeration options over the years. To fulfill some of those requests and allow addressing more we're introducing the EnumerationOptions class. The existing Directory and DirectoryInfo enumeration methods now have overloads that take this new class.

This class has some defaults that are different than historical behavior:

  1. Matching is simple- '?' is always one character, '*' is always 0+, "*.*" is any filename with a period
  2. Hidden and system files are skipped by default- typically one doesn't want the .vs, .git, $RECYCLE.BIN, Thumbs.db etc. folders in results.

You can, of course, choose any setting you want.

New enumeration extensibility model

Those options not enough? Now you can write your own fast, low level enumerators. Want your own matching algorithm? Want to total file sizes? Want to match all files with a set of extensions? Almost anything is possible with the new API that live in System.IO.Enumeration. It is a large topic, so I'll address it in the next post.

FAQ

What exceptions do I get from bad paths now?

Whatever the OS tells us, we'll tell you. This will always be some sort of IOException for a bad path. Sometimes it might be a File/DirectoryNotFound. The key thing to remember is that you get exceptions when you try to use the path. GetFullPath() doesn't throw for these as it has no practical way of checking.

What if I still want to check invalid characters ?

You can do this manually using GetInvalidPathChars(). It isn't recommended as it isn't always correct on any platform. You may have NTFS/FAT volumes mounted in Unix or vice-versa.

Where did the speed improvements come from?

Primarily from reducing allocations. Keeping work on the stack and out of the heap can have a dramatic impact on large sets. Some wins come from smarter use of available OS APIs. A chunk comes from not validating paths on every single API call.

When does AttributesToSkip get applied?

First thing. One effect of this is that filtering out FileAttributes.Directory makes the RecurseSubdirectories option meaningless.

What about other file matching types?

We definitely want to add more in the future. Things like globstar (**), POSIX pattern matching notation (glob), possibly regex. With the extensibility APIs writing a custom matcher is easy. If you're passionate about any of these we're always open to contributions.

Why did you change enumeration defaults?

Note that we only changed them when you use the new EnumerationOptions class. Existing APIs should behave as they did (modulo the matching consistency fixes mentioned in the post). We picked new defaults based on OS defaults (shell & command line) and what one would expect when enumerating end-user files (e.g. to present to users). These obviously won't be right for all scenarios, but changing the settings is easy.

What about the more obscure Windows matching characters? '<', '>', and '"' were unblocked in 2.0. They're still supported, but only on Windows through the normal APIs. You can, if you want, use them on Unix if you use the extensibility points described in the next post.