Understanding Domain Names in Internet Explorer
Web browsers use domain names for a variety of purposes, but how they’re used is much more complicated than most developers realize. In this post, I’ll attempt to cover the most important aspects of this topic.
When talking about “domains” the terminology alone is confusing (and contentious). So, let’s start with some simplistic definitions for terms used in this post:
- A label is a single component of a domain name string, delimited by periods. For instance, “www” “microsoft” and “com” are the three labels in the domain name “www.microsoft.com”
- A plainhostname is an unqualified, single label hostname like “Payroll”, which typically refers to a server on a local intranet.
- A FQDN is an absolute, fully-qualified domain name, like “www.microsoft.com”
- A Public Suffix is the suffix portion of a FQDN under which independent entities may register subdomains. For example, ltd.co.im is a Public Suffix. A Public Suffix contains one or more labels. Sometimes the term “effective TLD” is used as a synonym.
- A TLD is a top-level-domain, the right-most label of a domain name
- A gTLD is a generic TLD, like ".com”, “.net”, “.gov”, etc
- A ccTLD is a country-code TLD, like “.us” or “.ru”
- ICANN (the Internet Corporation for Assigned Names and Numbers) is responsible for the creation and management of TLDs
When web developers talk about “the domain,” they’re often referring to what this post calls the Private Domain:
- A Private Domain is a single label with a Public Suffix appended.
- Synonyms for Private Domain are "base domain" and "Public Suffix plus 1 (PS+1) "
For instance, the two Private Domains “Acme.ltd.co.im” and “Bayden.ltd.co.im” , are each independently operated subdomains of Public Suffix “ltd.co.im”.
Okay, now on to the fun stuff.
Domains and the IURI Interface
First, some foreshadowing…
IE7 and above use a Consolidated URI handling feature which exposes the IURI interface. Let’s have a quick look at a partial list of IURI property values from a sample URI: https://www.example.com/path/file.ext?query=val\#frag
Uri_PROPERTY_ABSOLUTE_URI "https://www.example.com/path/file.ext?query=val#frag" Uri_PROPERTY_DISPLAY_URI "https://www.example.com/path/file.ext?query=val#frag" Uri_PROPERTY_RAW_URI "https://www.example.com/path/file.ext?query=val#frag" Uri_PROPERTY_SCHEME_NAME "http" Uri_PROPERTY_DOMAIN aka Private Domain "example.com" Uri_PROPERTY_HOST aka FQDN or Plainhostname "www.example.com" Uri_PROPERTY_HOST_TYPE 1 Uri_PROPERTY_PORT 80 Uri_PROPERTY_PATH "/path/file.ext" Uri_PROPERTY_QUERY "?query=val"
It’s important to note that if the URI contains only a plainhostname (e.g. “https://example/” ) or a Public Suffix (e.g. “https://co.uk/ ”), then Uri_PROPERTY_DOMAIN is null.
Why Do Browsers Care About Domains?
Every browser must be able to determine the Private Domain for a number of uses, but in this post I’ll concentrate on IE’s use of this information.
1. Domain Highlighting in the Address Bar
IE8’s Domain Highlighting feature renders the Private Domain in black text and the rest of the URL in gray to help prevent the use of misleading URLs in spoofing attacks.
If the URL contains a plainhostname, the address bar will render the plainhostname in black instead.
2. Quota management for Local Storage
IE8 applies a per-Private Domain quota to values stored using the HTML5 Local Storage API.
If the Uri_PROPERTY_DOMAIN is null (because the URL contains a plainhostname) the browser will enforce the quota against Uri_PROPERTY_HOST instead.
3. document.domain relaxation
Same-Origin-Policy typically means that two pages must have exactly-matching FQDNs in order to script against each others’ DOM. However, HTML allows a page to relax its document.domain property to a suffix of its current value to enable cross host DOM communication within a single Private Domain. Script is not permitted to change its document.domain property to a string shorter than the private domain. This prevents sites from unrelated organizations from intentionally or inadvertently scripting against each others’ DOM.
4. HTTP Cookies
When setting a cookie, a website may specify which hosts the cookie should be sent to using the domain attribute. The browser must block attempts to set a cookie where the domain attribute does not end with the current page’s Private Domain. Failure to do so results in privacy and security concerns.
Privacy: Allowing unrelated domains to share cookies can result in “super-cookies”-- cookies which are sent to multiple unrelated organizations that happen to share a Public Suffix.
Security: Session-fixation attacks, where a good site and an evil site share a Public Suffix, and the evil site sets a malicious cookie on the Public Suffix so that the Good site is sent the evil cookie.
5. Security Zones – Mapping Domains to Zones
Because Public Suffixes are typically shared by multiple unrelated organizations, URLMon does not permit users to add all sites in a given public-suffix to a security zone.
We are aware that there are scenarios where such assignments may be desirable to some organizations (e.g. perhaps I would like to assign *.mil to the Trusted Sites Zone).
6. Security Zones – Automatic Zone Determination
7. Per-site ActiveX
When the user uses the Information Bar to allow an ActiveX control to run, Internet Explorer 8’s Per-Site ActiveX feature adds the current Private Domain to the Allow list for that control.
8. Compatibility View
Internet Explorer 8’s Compatibility View button adds the current Private Domain to the compatibility view list.
9. XSS Filter
IE8’s XSS Filter uses the Private Domain to determine whether a given navigation crosses from one Private Domain to another.
10. InPrivate Filtering
IE8’s InPrivate Filtering feature uses Private Domain information to help determine whether a given request is being sent to a 3rd party site.
11. Preserve Favorite Website Data
IE8’s Delete Browsing History feature includes a new “Preserve Favorites website data” option. As I described back in this post from June, this feature relies on the Private Domain to help determine whether stored data is related to one of the user’s favorite websites.
The Challenge of ccTLDs
In the early days of the web, most ccTLDs were organized in such a way that it was relatively easy to heuristically determine the Public Suffix of any FQDN. Over time, however, different ccTLDs decided that they wanted to create new Public Suffixes within their ccTLD, or decided to allow registration of Private Domains that the heuristics would incorrectly treat as Public Suffixes. Some nations (like Tuvalu) have outsourced registration of subdomains and allow anyone to obtain Private Domains within their ccTLD (.TV).
Prior to IE8, there was no one codepath in IE where the Private Domain was calculated, so over time several point-fixes were made to liberalize cookie setting in certain ccTLDs.
The heuristic Private Domain determination algorithm in IE5+ is:
1> If the final label is empty, drop it for the purposes of this algorithm
Otherwise "www.example.com." would have four labels "www", "example", "com", "". Instead, we drop the final label.
2> Name the labels Ln,...,L3,L2,L1; decreasing from start (Leftmost=Ln) to finish (Rightmost=L1).
If at any point in this algorithm the result demands >n labels, getPrivateDomain returns "".
3> Check n > 1. If not, there's no PublicSuffix, just a plainhostname. Return ""; exit.
Dotless FQDNs consist of a host only, there is no domain.
4> Check L1 == "tv". If so, getPrivateDomain returns L2.L1; exit.
"tv" is a special-case "completely flat" ccTLD for historical reasons.
5> Check Len(L1) > 2. If so, getPrivateDomain returns L2.L1; exit.
Len(L1)>2 suggests L1 is a gTLD rather than a ccTLD.
If Len(L1)<=2 we assume L1 is a part of a ccTLD.
6> Check if L2 in gTLD list "com,edu,net,org,gov,mil,int". If so, getPrivateDomain returns L3.L2.L1; exit.
gTLDs, when they appear immediately left of a ccTLD (modulo exception in step 4), are considered a part of the Public Suffix.
7> If L1 is in the list "GR,PL" AND L2 is NOT in the gTLD list, getPrivateDomain returns L2.L1; exit.
GR and PL are considered "flat" ccTLDs EXCEPT when a gTLD appears in L2.
getPrivateDomain("a.pl") returns "a.pl"
getPrivateDomain("a.uk") returns ""
8> If Len(L2) < 3 getPrivateDomain returns L3.L2.L1; exit.
getPrivateDomain("aa.bb.cc") returns "aa.bb.cc"
9> Otherwise, getPrivateDomain returns L2.L1
getPrivateDomain("aa.bbb.cc") returns "bbb.cc"
While this heuristic worked pretty well for many years (and still works reasonably well in general) it clearly was becoming increasingly complicated due to the fact that each ccTLD established different operating practices (and those, in turn, changed over time).
Changes in Internet Explorer 8
For IE8, we’ve updated major codepaths to use CURI’s Uri_PROPERTY_DOMAIN for Private Domain determination, helping to ensure consistency throughout the various browser components.
IE8's version of URLMon maintains a list of special-cases which are used as exceptions to the default heuristics that CURI uses. You can click this link to view the list maintained as an XML resource inside URLMon.dll. The list contains elements which should be treated as Public Suffixes (the XML nodes named “tld”) and elements which should be treated as private domains (the XML nodes named “domain”).
From a browser architecture perspective, lists like this one are the option of last resort, for a number of important reasons. However, there’s no currently no standard that promises relief. One proposal which has been discussed in a few forums is to allow the DNS itself to indicate (via a new record) which names are part of a Public Suffix and which are part of a Private Domain, but that approach is not without problems.
The (Coming) Challenges with gTLDS
ICANN recently voted to allow organizations to create new generic TLDs. Introduction of new gTLDs may introduce additional problems, because previously most of the “special cases” were found only in ccTLDs. Other parties (like Certificate Authorities) would also likely be significantly impacted by this liberalization of gTLDs.
As this area is still developing, it will likely be the topic of a future post. (For now, see this one)
UPDATE: Internet Explorer on Windows10 now uses the TLD list from PublicSuffix.org. https://blogs.msdn.com/b/ie/archive/2014/10/01/internet-explorer-and-the-windows-10-technical-preview.aspx