Wildcards in Windows
I put forth the challenge in my last post about how to match files that start, end, or simply have a period in them. It isn't easy to do.
Wildcards in paths seem simple at first. You have '?' to match any single character and '*' to match zero or more of any characters.
Matching in reality doesn't exactly match up. '*' gives you all files. '?' gives you all files that are just one character. '*a*' gives you all files that have an 'a' in them. '*_*' gives you all files that have an underscore in them. '*.*' gives you ... all files.
The seemingly odd behavior of matching dates back to DOS days. Every file was in 8.3 form: 8 characters for the name, a period, and a 3 character file extension. You couldn't use a period in the filename or extension- it was always the separator for the two filename components. (The period wasn't even stored on disk.)
MS-DOS was based on QDOS (later 86-DOS), which was written by Tim Patterson for Seattle Computer Products for their 8086 computer kit. It was designed to be compatible with the dominant microcomputer OS of the late 70s, CP/M. Both QDOS and CP/M had 8.3 filenames and used similar wildcard matching with ? and *. CP/M was inspired primarily by DEC OSes, which had the same style filenames and wildcard matching. (Most DEC OSes had 6.3 filenames, and also used the colon ':' to specify devices, akin to 'C:'.)
With the advent of 32 bit Windows (Windows 95 and NT), 8.3 filenames were no longer a limitation. You could now have filenames that were up to 255 characters, and the period was now a legitimate character. After 25 years or so of '*.*', keeping similar semantics made a lot of sense. Here are some of the common patterns in DOS:
|*.*||All file names, all extensions (all files).|
|*.||All file names without extensions.|
|foo*.*||All files beginning with "foo".|
|foo??.*||All files beginning with "foo" and up to two additional characters.|
The real rules were subtle and had some "surprising" behaviors that were not carried forward to Win32. Raymond Chen describes the algorithm in "How did wildcards work in MS-DOS?".
So how exactly do wildcards work now? The patterns mentioned above work as expected, but how? It obviously isn't exactly correct that '?' matched any single character and '*' to match zero or more of any characters... The answer lies in that there are actually five wildcards in Windows:
|*||Zero or more characters.|
|?||Exactly one character.|
|<||Matches zero or more characters until encountering and matching the final period in the name. (DOS_STAR)|
|>||Matches any single character, or zero if to the left of a period or the end of the string- or contiguous to other > that are in said position. (DOS_QM)|
|"||Matches a period or zero characters at the end of the string. (DOS_DOT)|
If these are the rules, why don't they work that way? The answer is that Win32 implicitly changes your wildcards. This is the another part to the mystery of wildcards. The rules are as follows:
- All '?' are changed to '>'.
- All '.' followed by '?' or '*' are changed to '"'.
- A path ending in '*' that had a final period before normalizing is changed to '<'.
The last rule brings up the last piece of the wildcard puzzle. As described in my previous posts, path normalization eats trailing periods and spaces. This can make it difficult to search for filenames with trailing spaces/periods.
Now that we have the details of wildcard matching the challenges in my last post can be answered.
List files ending with a space
You need to skip normalization for this to work. dir "\\?\C:\* " will give you all files that end with a space in "C:\".
List files starting with a period
This is actually pretty simple. dir .* will give you all files that start with a period in the current directory.
List files with a period
This involves using the "DOS" wildcards directly (which need to be escaped at the command line). dir *.^< will give you all files that have a period in the current directory. "*.<" is necessary as, per the rules, "*.*" would become "*"*" which is equivalent to "*".
List files ending with a period
You need to skip normalization for this to work. dir \\?\C:\*. will give you all files that end with a period in "C:\".
APIs in play
Searching for files always goes through one of the FindFirstFile APIs. FindFirstFile will normalize the path, cutting it into the directory portion of the path and the filename. The filename wildcards are munged as explained above. FindFirstFile opens a file handle on the directory and calls NtQueryDirectoryFile with the altered filename. (Full details of the API are in the kernel mode version ZwQueryDirectoryFile.)
When the filesystem driver responds to the query it will use FsRtlIsNameInExpression to do it's matching. RtlIsNameInExpression is the user mode version of this API. (Note that the documentation can be a little misleading. When you do a case-insensitive search and don't provide a translation table, the name is converted to uppercase. The expression (pattern) is not- you have to do this manually. RtlUpcaseUnicodeString is the Win32 API for this.)
If you really want to use "*" and "?" you could potentially call NtQueryDirectoryFile directly. You won't get your existing search string changed, but all five wildcards are processed, of course.