CreateUri function
Creates a new IUri instance, and initializes it from a Uniform Resource Identifier (URI) string. CreateUri also normalizes and validates the URI.
Syntax
STDAPI CreateUri(
_In_ LPCWSTR pwzURI,
_In_ DWORD dwFlags = Uri_CREATE_CANONICALIZE,
_Reserved_ DWORD_PTR dwReserved,
_Out_ IUri **ppURI
);
Parameters
pwzURI [in]
A constant pointer to a UTF-16 character string that specifies the URI.
dwFlags [in]
A valid combination of the following flags.
Uri_CREATE_ALLOW_RELATIVE (0x0001)
Default. If the scheme is unspecified and not implicitly "file," assume relative.
Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME (0x0002)
If the scheme is unspecified and not implicitly "file," assume wildcard.
Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME (0x0004)
Default. If the scheme is unspecified and URI starts with a drive letter (X:) or UNC path (\\), assume "file."
Uri_CREATE_NOFRAG (0x0008)
If there is a query string, don't look for a fragment.
Uri_CREATE_NO_CANONICALIZE (0x0010)
Do not canonicalize the scheme, host, authority, path, query, or fragment.
Uri_CREATE_CANONICALIZE (0x0100)
Default. Canonicalize the scheme, host, authority, path, query, and fragment.
Uri_CREATE_FILE_USE_DOS_PATH (0x0020)
Use DOS path compatibility mode to create "file" URIs.
Uri_CREATE_DECODE_EXTRA_INFO (0x0040)
Default. Perform the percent-encoding and percent-decoding canonicalizations on the query and fragment. This flag takes precedence over Uri_CREATE_NO_CANONICALIZE.
Uri_CREATE_NO_DECODE_EXTRA_INFO (0x0080)
Do not perform the percent-encoding or percent-decoding canonicalizations on the query and fragment. This flag takes precedence over Uri_CREATE_CANONICALIZE.
Uri_CREATE_CRACK_UNKNOWN_SCHEMES (0x0200)
Default. Hierarchical URIs with unrecognized schemes will be treated like hierarchical URIs.
Uri_CREATE_NO_CRACK_UNKNOWN_SCHEMES (0x0400)
Hierarchical URIs with unrecognized schemes will be treated like opaque URIs.
Uri_CREATE_PRE_PROCESS_HTML_URI (0x0800)
Default. Perform preprocessing on the URI to remove control characters and white space, as if the URI had come from the raw href value of an HTML page.
Uri_CREATE_NO_PRE_PROCESS_HTML_URI (0x1000)
Do not perform preprocessing to remove control characters and white space as appropriate.
Uri_CREATE_IE_SETTINGS (0x2000)
Use Internet Explorer registry settings to determine default URL-parsing behavior.
Uri_CREATE_NO_IE_SETTINGS (0x4000)
Default. Do not use Internet Explorer registry settings.
Uri_CREATE_NO_ENCODE_FORBIDDEN_CHARACTERS (0x8000)
Do not percent-encode characters that are forbidden by RFC-3986. Use with Uri_CREATE_FILE_USE_DOS_PATH to create file monikers.
Uri_CREATE_NORMALIZE_INTL_CHARACTERS (0x00010000)
Default. Percent encode all extended Unicode characters, then decode all percent encoded extended Unicode characters (except those identified as dangerous).
dwReserved [in]
Reserved. Must be set to 0.
ppURI [out]
An IUri interface pointer that receives the new instance.
Return value
Returns one of the following values.
Return code | Description |
---|---|
S_OK | Success. |
E_INVALIDARG | dwFlags conflict, or ppURI is NULL. |
E_OUTOFMEMORY | There is insufficient memory to create the IUri. |
INET_E_INVALID_URL | The string does not contain a recognized URI format. |
INET_E_SECURITY_PROBLEM | The URI contains syntax that attempts to bypass security. |
E_FAIL | Unknown error while parsing the URI. |
Remarks
CreateUri returns E_INVALIDARGS if conflicting flags are specified in dwFlags. For example, Uri_CREATE_DECODE_EXTRA_INFO and Uri_CREATE_NO_DECODE_EXTRA_INFO, or Uri_CREATE_ALLOW_RELATIVE and Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME. INET_E_SECURITY_PROBLEM is returned if the URI specifies userinfo but the Windows Internet Explorer feature control FEATURE_HTTP_USERNAME_PASSWORD_DISABLE is enabled.
Hierarchical vs. Opaque Protocol Schemes
Hierarchical URIs and opaque URIs are mutually exclusive. A hierarchical URI conforms to the RFC-defined syntax for URIs. (Refer to RFC3986: Uniform Resource Identifier (URI), Generic Syntax.) An opaque URI is parsed without an authority in the following manner.
scheme ":" path [ "#" fragment ]
By default, all URIs are treated as hierarchical unless the Uri_CREATE_NO_CRACK_UNKNOWN_SCHEMES is set. (Unknown protocol schemes are those not defined in the URL_SCHEME enumeration.) The two flags Uri_CREATE_ALLOW_RELATIVE and Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME only apply if the string input is not an implicit file path or an absolute (hierarchical) URI. The syntax for relative URIs is a shortened form of the syntax for absolute URIs, where some prefix of the URI is missing and path segments ("." and "..") are allowed to remain until combined with a base URI. The wildcard URI scheme might be explicitly stated as "*:[[//]authority][path]," or implicitly stated by the "authority[path]" form.
CreateUri can parse URIs in both the URL syntax and the Uniform Resource Name (URN) syntax. The difference between URLs and URNs is whether there is a protocol that enables access to the identified resource. Accessing the resource identified by an IUri is outside the scope of the Consolidated URL (cURL) API.
Creating File Schemes from File Paths
There are two kinds of file scheme URIs. The first is the well-formed, or "healthy," URL style that supports query strings, fragments, percent-encoded octets, and so on. The other is basically a DOS file path with "file://" prepended to the front. This latter form is generated when Uri_CREATE_FILE_USE_DOS_PATH is set and should be used only for legacy communication.
Warning Legacy file scheme URIs should be used only with legacy APIs that will not accept healthy file scheme URIs. Legacy file scheme URIs do not allow percent encoded octets, which can lead to ambiguity. Therefore, legacy file scheme URIs should not be used unless absolutely necessary.
The following is a comparison of the two forms of file scheme URIs.
DOSPATH: C:\Windows\My Documents 100%20\file.txt
HEALTHY: file:///C:/Windows/My%20Documents%20100%2520/file.txt
LEGACY: file://C:\Windows\My Documents 100%20\file.txt
DOSPATH: \\server\share\My Documents 100%20\file.txt
HEALTHY: file://server/share/My%20Documents%20100%2520/file.txt
LEGACY: file://\\server\share\My Documents 100%20\file.txt
The Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME flag allows the creation of a file scheme URI from a Microsoft Win32 file path. It doesn't change the interpretation of the input string; that is, if a Win32 file path is passed in, CreateUri either succeeds or fails based on the Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME flag; it won't change the interpretation of the input string.
Understanding Canonicalization
Canonicalization, or conversion into the standard URI format, involves the following steps.
The scheme is changed to lowercase.
If the host is an IPv4 or IPv6 address, it is converted to normal form.
If the host is a named host, it is changed to lowercase. Internationalized Domain Names (IDNs) with labels in Punycode are converted to Unicode.
If the explicit port is the same as the default port for the scheme, it is removed.
Backslash (\) characters in the path are changed to forward slash characters (/) in http, https, ftp, news, nntp, snews, and telnet schemes.
If the URI has an authority but no path, the path is set to "/".
Relative path segments "./" and "../" are removed, and the path is shortened as appropriate.
Percent-encoded characters in the format "%XX," (where X is a hexadecimal digit) are decoded, if they are unreserved.
Characters that are forbidden to appear in a URI are percent encoded. Forbidden characters are those that are neither in the "reserved" nor "unreserved" sets. The percent sign (%), which is used for percent encoding, is allowed. Refer to the following table for details.
Class Characters unreserved alphanumeric, hyphen (-), period (.), underscore (_), and tilde (~) reserved gen-delims + sub-delims gen-delims colon (:), slash (/), question mark (?), hash (#), square brackets ([]), and at sign (@) sub-delims exclamation point (!), dollar sign ($), ampersand (&), single quote ('), parentheses (()), asterisk (*), plus sign (+), comma (,), semicolon (;), and equal sign (=)
The following is a raw URI value.
hTTp://us%45r%3Ainfo@examp%4CE.com:80/path/a/b/./c/../%2E%2E/Forbidden'<|> Characters
After canonicalization, the absolute URI appears as follows.
http://usEr%3Ainfo@example.com/path/a/Forbidden%60%3C%7C%3E%20Characters
- In the username component, the %45 is decoded to "E" because it is in the unreserved set, while the %3A (@) is not.
- In the host component, the %4C is first decoded to "L," and then changed to lowercase.
- The port "80" (the default port for http) is removed.
- The "./" in the path is removed.
- The "../" following the "c/" in the path is removed along with its logical parent, the "c/" path segment.
- The %2E characters are in the unreserved set and are converted to "." forming "../". This new "../" is removed along with its logical parent path segment, which in this case is "b/."
- All of the characters between "Forbidden" and "Characters" (including the space) are percent encoded because they are forbidden to appear in a URI.
Examples
The following example creates an IUri object from a NULL-terminated URI string and then uses IUri::GetHost to retrieve the host value.
IUri *pIUri = NULL;
HRESULT hr = CreateUri(
pwszUri, // NULL terminated URI
Uri_CREATE_ALLOW_RELATIVE, // Flags to control behavior
0, // Reserved must be 0
&pIUri);
if (SUCCEEDED(hr))
{
BSTR bstrHost = NULL;
hr = pIUri->GetHost(&bstrHost);
if (S_OK == hr)
{
// Host exists. Do something with it.
SysFreeString(bstrHost);
}
else if (S_FALSE == hr)
{
// No Host in this URI.
}
pIUri->Release();
}
Requirements
Minimum supported client |
Windows XP with SP2 |
Minimum supported server |
Windows Server 2003 with SP1 |
Product |
Internet Explorer 7 |
Header |
Urlmon.h |
Library |
Urlmon.lib |
DLL |
Urlmon.dll |
See also
Reference