File URIs in Windows
Invalid file URIs are among the most common illegal URIs that we were forced to accommodate in IE7. As I mentioned in a previous blog post there is much confusion over how to handle file URIs. The standard for the file scheme doesn’t give specific instructions on how to convert a file system path for a specific operating system into a file URI. While the standard defines the syntax of the file scheme, it leaves the conversion from file system path to file URI up to the implementers. In this post, I describe the conversion we use in IE, and I have a list of best-practices to use when constructing or manipulating file URIs.
For the UNC Windows file path
\\laptop\My Documents\FileSchemeURIs.doc
The corresponding valid file URI in Windows is the following:
file://laptop/My%20Documents/FileSchemeURIs.doc
For the local Windows file path
C:\Documents and Settings\davris\FileSchemeURIs.doc
The corresponding valid file URI in Windows is:
file:///C:/Documents%20and%20Settings/davris/FileSchemeURIs.doc
The important factors here are the use of percent-encoding and the number of slashes following the ‘file:’ scheme name.
In order to avoid ambiguity, and for your Windows file paths to be interpreted correctly, characters that are important to URI parsing that are also allowed in Windows file paths must be percent-encoded. This includes ‘#’ and ‘%’. Characters that aren’t allowed in URIs but are allowed in Windows file paths should also be percent-encoded. This includes ‘ ‘, ‘{‘, ‘}’, ‘`’, ‘^’ and all control characters. Note, for instance, that the spaces in the example URIs above have been percent-encoded to ‘%20’. See the latest URI standardfor the full list of characters that aren’t allowed in URIs.
The number of slashes following the ‘file:’ is dictated by the same rules as other wellknown schemes like http and ftp. The text following two slashes is the hostname. In the case of the UNC Windows file path, the hostname appears immediately following the ‘//’. In the case of a local Windows file path, there is no hostname, and thus another slash and the path immediately follow.
The username, password, and port components of a file URI in Windows are not used. In IE, including any of these components means you won’t be able to navigate to the URI. In contrast, the query and fragment components may be used. The query component will not be used when locating the resource, but the application that displays the content from the file URI may use the query component. For example, if an html document contains script, the script may read the query component of its URI when accessed via the file scheme. Similarly, the fragment will be used like a fragment in any other URI scheme.
The following are some examples of poorly formed file URIs with which we’ve dealt. (Paths have been modified to hide the identity of the culprits. :-) These “bad” URIs will continue to work in IE7, however you should steer clear of them for the reasons stated and since there’s no guarantee of support in the future.
Incorrect: file://D:\Program Files\Viewer\startup.htmCorrect: file:///D:/Program%20Files/Viewer/startup.htm
A large set of invalid file URIs come from the common but incorrect notion that it’s acceptable to place a Windows file path after the text ‘file://’ and call it a file URI. This is bad because Windows file paths, as mentioned earlier, may contain characters that aren’t allowed in URIs or that are important to the parsing of URIs. For instance, if a ‘#’ is in a Windows file path and that Windows file path is simply appended to the text ‘file://’ then we can’t know if the ‘#’ is supposed to be part of the path or if its supposed to delimit the fragment as it would in an actual URI. Similarly, if the path contains a ‘%’ then we can’t determine whether the ‘%’ identifies a percent-encoded octet, or if it is just a plain percent character in the Windows file path. Zeke Odins-Lucas wrote an informative and entertaining blog post on this topic.
Incorrect: C:\Program Files\Music\Web Sys\main.html?REQUEST=RADIO
Correct: file:///C:/Program%20Files/Music/Web%20Sys/main.html?REQUEST=RADIO
In many places inside IE, we allow a Windows file path as input when the input is actually specified as a URI. For example, the function CreateURLMonikerEx takes a string URI, but a Windows file path may be provided instead. Despite this, it is important to realize that a Windows file path is not a URI and a URI is not a Windows file path. You should not, as is done in this example, place a ‘?’ character after a Windows file path and provide a query component. The Windows file path has no such construct. If you wish to reference a file and provide a query then you must use a file URI.
Incorrect: file:////applib/products/a%2Db/ abc%5F9/4148.920a/media/start.swf
Correct: file://applib/products/a-b/abc_9/4148.920a/media/start.swf
The author of this URI was heading in the correct direction. They converted the backslashes in their Windows file path to forward slashes and they percent-encoded characters they thought should be encoded. Although they meant well, there are a couple of problems. First, ‘applib’ is meant to be the host, but is preceded by two extra slashes. If interpreted as an actual URI, then applib isn’t the host but rather part of the path. If interpreted as a legacy file URI (as described by Zeke in his previously mentioned blog post) then those percent-encoded octets will be interpreted literally. Additionally, the characters ‘-‘ and ‘_’ are percent-encoded in this example, but shouldn’t be, as stated by the URI standard.
Characters outside of US-ASCII may appear in Windows file paths and accordingly they’re allowed in file IRIs. (URIs are defined as US-ASCII only and so when including non-US-ASCII characters in a string, what you've actually created is called an IRI: Internationalized Resource Identifier.) Don’t use percent-encoded octets to represent non US-ASCII characters because, in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage. The meaning of a URI containing percent-encoded octets for bytes outside of US-ASCII will change depending on the locale in which the document is viewed. Instead, to represent a non-US-ASCII character you should use that character directly in the encoding of the document in which you are writing the IRI. For instance:
Incorrect: file:///C:/example%E3%84%93.txtCorrect: file:///C:/exampleㄓ.txt
In the latest URI standard IPv6 literals are a part of the URI host syntax. In Windows, file URIs are dereferenced by converting them to their corresponding Windows file path and then using Windows file APIs to access the Windows file path. Since there’s no way to include an IPv6 address in a Windows file path, there’s no corresponding file URI and so there’s no way to incorporate an IPv6 address in file URIs in Windows. You can still use a hostname that resolves to an IPv6 address in the file URI, just not the IPv6 literal itself.
To reiterate the points above, please construct and use well-formed file URIs. If you’re writing code that generates or interprets file URIs, use the functions PathCreateFromUrl and UrlCreateFromPath to convert between Windows file paths and file URIs. These functions will work correctly with well-formed file URIs and legacy file URIs. Even if your file URI syntax looks reasonable and works in one case, that doesn’t mean it will work correctly in corner cases like paths that contain the ‘#’ or ‘%’ characters.
If you know of other interesting misuses of file URIs or have other related comments please let us know!
Dave Risney
Software Design Engineer
edit: added incorrect/correct wording, link update
edit: Corrected URI/IRI language in the Non US-ASCII Characters section
Anonymous
January 01, 2003
hi,i jaust want to see if i can post comment is that alrightAnonymous
December 06, 2006
The comment has been removedAnonymous
December 06, 2006
This is one of those things that I have often thought about, but has never seemed important enough toAnonymous
December 06, 2006
Very informative. Thanks for trying to get compliant with the URI spec.Anonymous
December 06, 2006
So funny that you say all this without mentioning netscape or unix!Anonymous
December 06, 2006
Regarding IPv6 literals, IE7 under both XP SP2 and Vista accept them in the form http://[IPv6_literal]. I realize that is an URL, and you are talking about URI, but can't they be included in a URI using the same syntax?Anonymous
December 06, 2006
@Mike Brown: Thanks for your expressive comment. I'll respond to your comment in parts. With respect to '|' as a drive delimiter: I'm glad you're against using the '|'. I agree with your points. That said, IE will continue to support this as it has done in the past in order to maintain compat. with older applications that depend on this. With respect to '%3A' as a drive delimiter: IE7 doesn't treat this as a drive delimiter. As you note we as implementers of this scheme get leeway and so we don't need to equate ':' with '%3A' in this case. But on the other hand there's no reason not to from a design perspective and I like your points. I'll put this down to consider for future changes.Anonymous
December 06, 2006
@Andrew Sherman: I'm not sure what to say regarding file URIs and Netscape or Unix. The file URI spec leaves resolution of file URIs up to the implementer meaning that Netscape's ideas of file URIs and other browsers on Unix may have very different concepts of file URIs. This post was meant to highlight best practices for file URIs in Windows w/ IE6 or IE7. Do you have any specific questions regarding Unix and Netscape?Anonymous
December 06, 2006
@John Baird: Actually, what I wrote about IPv6 literals applies only to file URIs in IE7. It doesn't apply to http, ftp, or any other URIs in IE7. As you correctly note http URIs with IPv6 literals work in Vista and XPSP2 w/ IE7. Its just file URIs that don't work with IPv6 literals. With regard to URI vs URL check out RFC 3986 section 1.1.3 which describes the relationship between URI, URL, and URN: http://tools.ietf.org/html/rfc3986#section-1.1.3Anonymous
December 06, 2006
@Mike Brown: Wrt your comment about a more formal publication. This document is an informal attempt to highlight best practices for file URIs in IE. This means, as you noted, I didn't get into the details of the various bad/deprecated practices for file URIs in IE like use of '|' or the legacy file URI syntax I mention. I'd be happy to work on a more formal publication. Do you have any publication channel did you have in mind?Anonymous
December 06, 2006
@Mike Brown: With respect to the drivespec in the authority: I find myself agreeing with you yet again. That's a an interesting point looking at drivespec in the authority with an eye towards what functionality it enables. However, defining file URIs in this manner in IE isn't feasible because of support for the legacy file syntax. The legacy file syntax is described by Zeke in his blog: http://blogs.msdn.com/freeassociations/archive/2005/05/19/420059.aspx The syntax of good natured and well meaning file URIs w/ drivespecs in the authority would conflict with the syntax of the legacy file URIs. We can't remove support for legacy file URIs in order to maintain application compat. with a significant number of applications. If the day comes when we can remove the legacy syntax discussion on drivespec in the authority can be reopened.Anonymous
December 06, 2006
The comment has been removedAnonymous
December 07, 2006
A little bit off topic here but I want to highlight to you again about a problem which causing a lot of complaint from our user after IE7 rollout: header disappear when printing from Outlook with IE7 installed. This seems like a problem which a lot of other people also face: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=953593&SiteID=1 I hope IE team can release a fix for this problem soon.Anonymous
December 07, 2006
Great post, thanks guys. I am, however, seeing a strange effect in IE7 with a link to a local file. Can anyone comment? I know the link is wrong, but shouldn't IE still work with it? Other browsers are OK. The link is shown near the end of my post here (I won't reprint it directly incase the comment system mucks it up. Someone add a Preview mode soon!). See the part marked "Update 12 April 2006" above the first comment. http://www.designdetector.com/2006/04/first-impressions-of-ie7.php In other browsers, the link is fixed and works. But in IE7, the drive letter is doubled! Hence the link doesn't work. It is OK in IE6.Anonymous
December 07, 2006
The comment has been removedAnonymous
December 07, 2006
To honour file:// links in web ( no local ) pages is a bad design decission. Firefox for security purposes ( with default configuration ) does not follow this type of links ( http://kb.mozillazine.org/Links_to_local_pages_don't_work )Anonymous
December 07, 2006
@ Andrew Sherman 'So funny that you say all this without 'mentioning netscape or unix! Why? This is an IE7 blog with an entry about the handling of URI's. Why does there need to be any mention of other browsers, OS', etc? BizarreAnonymous
December 07, 2006
The comment has been removedAnonymous
December 07, 2006
There is a small triangle icon at the side of each icon in the toolbar. Can that triangle icon be at the bottom of the actual icon. This is the case in windows media player(11). This way it allows me to add one more icon for each triangle i remove. Also Could you give an option to NOT show text like "Page", "Tools" etc. beside the icon.(Is there already such an option?)Anonymous
December 07, 2006
Oh, and if it hasn't been said a 1,000 times already, please make Node (from the DOM), a first class JavaScript citizen in IE7. Being able to prototype on this, WOULD allow developers to SOLVE almost all DOM related bugs in IE7+!Anonymous
December 07, 2006
The comment has been removedAnonymous
December 07, 2006
@EricLaw: Thanks for the answer about dropping the fragment for a file URL. Can we expect a fix for this soon and can you provide some workaround that can be used in the mean time.Anonymous
December 07, 2006
The comment has been removedAnonymous
December 07, 2006
I think on IE 7 beta 3 a option to bring back the tab information(showtabswelcome on registry) is available on tabbed browsing settings. I can't find it on the tab setting at all. Is that option remove on the tab setting in the final version of IE 7? the option is available in the registry only. Can I bring back the information tab or tabs welcome by putting 1 and hexadecimal. HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerTabbedBrowsing ShowTabsWelcome (0)Anonymous
December 07, 2006
Is this a permission or a URI problem? Local files are inaccessible by the native XMLHttpRequest introduced in IE7. Here's a testcase. Just set up a 1.html with whatever in it, then run it. It fails on IE7 but works fine on other browsers such as Gecko-based Firefox, WebKit-based Safari. Anyway, this works fine too by using the ActiveX XMLHttpRequest. <script> var url = "1.html"; // 1.html is a local file http = new XMLHttpRequest(); http.open("GET", url, false); alert((http.status==200 || http.status==0) ? http.responseText : "HttpGet Error Status: " + http.status); </script>Anonymous
December 08, 2006
The comment has been removedAnonymous
December 08, 2006
I have several searchURL keys defined under HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerSearchUrl Any keyword that uses the file:/// is not recognized in the Address bar and it is performing the default google search. This is not happening with http url. This was working fine in ie6. E.g. for file URL is mysearch=file:///C:/test/ps/portal_load_dev.html%s instead of running the file:/// when i type mysearch 1 in the address bar it tries to run http://mysearch%201/Anonymous
December 08, 2006
I read the articles about JS optimization in IE, but discussion there is closed. The fact is, you have an optimization issue that comes before all the other optimization issues, under a marketing point of view. Ie7 is slow at loading. I am sure that's not the first time you hear this lol. You know why? because it is, sadly, true. It is darn slow at loading. And I have 516 Ram and a 2000+ pentium and a Win XPP. That must be enough for a browser to load fast. Loading fast has always been the major reason I preferred IE to all the rest. I never really cared if IE6 couldn't parse a css fixed layer or other csss amenities of real concern only for those who think that css is everything. But IE7 is truly slower than Ie6. That 'connecting' thing takes several seconds, which IE6 didn't. It can be enough to make it as slow as Firefox to load. I know as 'customers' we may seem 'hysterical' - but why you have neglected loading speed, IE most coveted characteristic? Why such a bad new 'feature' in such good product? You may jeopardize the whole effort just because of that. Next fix should be focused all on that... A guy who loves IE is talking here. It's slow at loading. It's a fact. :(Anonymous
December 09, 2006
The comment has been removedAnonymous
December 09, 2006
The comment has been removedAnonymous
December 09, 2006
This is totally off-topic, I'm afraid, but I haven't been able to locate anywhere else to report this; a pointer would be appreciated if any. As a sysadmin, I normally use Group Policy to configure the IE proxy settings automatically, to avoid complication when dealing with users. Unfortunately, in IE7, this greys out the proxy settings for LAN in the IE options, such that they can't be altered by the users. This would have been fine in the old days, where off-site users would connect to the Internet via dial-up connections. However, as wireless networks of all kinds are considered LAN connections, this means that the users are unable to access the web using IE when connected via wireless, as they are configured to use the proxy and cannot change this. Obviously, this is something of a problem. Are there any plans to modify this in future releases of IE, 7 or later?Anonymous
December 09, 2006
The comment has been removedAnonymous
December 09, 2006
Interesting, I never realised you could link to UNC paths using URI's.Anonymous
December 10, 2006
@Aedrin: Re: "If you are going to do a test, do it fairly and use the whole character range. Picking out a specific range you know that IE doesn't handle well is a little bit unfair. You sound like a politician." Sure thing, no problem... pick a range, any range you want... the 3 or 4 I picked out were at random... you're welcome to post stats on any other range, but I highly doubt you will find stats in the reverse.Anonymous
December 11, 2006
The comment has been removedAnonymous
December 11, 2006
What about IE7 and the URL: ftp://username:password@host.domain This worked in IE6 but does not in IE7.Anonymous
December 11, 2006
The IE team have written their interpretation of the file:// URI specs. With most operating systems the file: URI is simple, due to the common root used by most non-Microsoft operating systems. For example on Linux /home/dave/index.html would be file:/..Anonymous
December 12, 2006
The comment has been removedAnonymous
December 12, 2006
@steve_web So.. the ability to see unicode 'shapes' (that do not appear to be related to a language) - in the URL of IE equates to a "significant lack of Unicode support"? Hmm.Anonymous
December 12, 2006
@steve_web To go a step further - I would argue that IE is correct in omitting this range. These are untypeable characters and therefore their appearance in a URL is questionable. I expect that the 'latest URI standard' that Dave Risney linked to in his post highlights this - there's a new task for you!Anonymous
December 13, 2006
The comment has been removedAnonymous
December 13, 2006
@Andrew. NO, was not talking about the URL at all. We're talking about in page rendering of unicode chars. So, If I want to use characters, to draw images, or use special 1/2, 1/3, 1/4 type characters. There are many arrows, blocks and other useful shapes that would be very handy to be able to use, but for reasons yet unexplained, support for vast ranges of unicode is missing.Anonymous
December 15, 2006
The comment has been removedAnonymous
May 26, 2007
The comment has been removedAnonymous
February 04, 2009
The office connector build uses cargo to run Confluence for functional tests. The latest snapshot of Confluence can't load bundled plugins on Windows because of this problem in the logs. {noformat} java.io.FileNotFoundException: file:C:projects...