Output Encoding
Hi Anil Chintala here....
I am a Developer on CISG team working out of the Hyderabad campus in India. I am responsible for building security software for the information security group within Microsoft IT. I have a bachelors degree in mechanical engineering and I have worked in various roles from development to managing a dev team for a startup in India, delivering technical solutions and managing customer relations where I gained more knowledge on cryptography, general security awareness, techniques and secure coding skills. Before joining Microsoft, I worked as a consultant for the ACE Team in Redmond and was involved in designing and building business critical applications supporting their information security program. In December 2007 I left V-Empower and United States, to take up a full-time (or "FTE" in MSFT speak) position in ACE Engineering team in India. I am currently working as a developer on the AntiXSS team building the next generation of the AntiXSS library. I also just started my personal blog ( can't believe I've waited this long) where I intend to post frequently on technical content, provide interesting links and provide my opinion on security, software engineering, process, tools and technologies. Apart from working for this amazing team, I enjoy watching movies, playing video games and recently started playing tennis to keep myself fit.
Today as a gentle introduction I'll try and show you how to prevent XSS vulnerabilities to happen in your ASP.NET applications.
Cross Site Scripting (XSS) vulnerabilities occur in your ASP.NET applications when a malicious script or un-validated user input is executed while viewing dynamically generated pages.
In general XSS vulnerabilities can be prevented by following countermeasures:
- Validate Input - Constrain all special characters and user input to acceptable range,type and length of input characters.
- Encode Output - Encode the output displaying to browser which includes any user input.
I'll consider "Validating Input" as a subject for another day ( for more information on Input Validation, see How To: Use Regular Expressions to Constrain Input in ASP.NET) and limit the scope of this post to output encoding techniques to prevent XSS in ASP.NET application. As I mentioned above, one solution to prevent XSS vulnerabilities is to encode values before they are rendered to users.
Microsoft Patterns and Practices Guide demonstrates How To: Prevent Cross-Site Scripting in ASP.NET, where the following valid recommendations are made with excellent code examples:
- Use the HttpUtility.HtmlEncode method to encode output if it contains input from the user or from other sources such as databases.
- Similarly, use HttpUtility.UrlEncode to encode output URLs if they are constructed from input.
Although HttpUtility.HtmlEncode/HttpUtility.UrlEncode methods prevent XSS vulnerabilities when characters like "<", ">" and "&" are used, but they can be vulnerable when user input contains characters outside of this limited set of characters. Below is an example which shows how a code can be vulnerable even after using HttpUtility.HtmlEncode method.
In Secure Example 1 - VulnerablePage.aspx
1: <input type=text value=<%= HttpUtility.HtmlEncode(Request.QueryString["name"]) %> ></input>
In Secure Example 2 - Vulnerable Page.aspx
1: <head runat="server">
2: <title>Untitled Page</title>
3:
4: <script>
5: function fnEvil()
6: {
7: var id = '<%= Server.HtmlEncode(name)%>';
8: }
9: </script>
10:
11: </head>
In Secure Example 2 - VulnerablePage.aspx.cs
1: protected string name = string.Empty;
2:
3: protected void Page_Load(object sender, EventArgs e)
4: {
5: name = Request.QueryString["name"];
6: }
7:
8: Response.Write(HttpUtility.HtmlEncode(Request.Form["name"]));
Now consider a user input like " '; alert(XSS);// " which results the following insecure code.
1: <script type="text/javascript">
2: function fnEvil()
3: {
4: var name=''; alert(XSS);//';
5: }
6: </script>
Reason for this is, System.Web.HttpUtility follows a principle of exclusion only escaping the known dangerous characters (such as <, >, and & ) where as AntiXSS library follows a principle of inclusion and allows only a small set of safe characters to escape and encodes everything else. Following is the safe characters list:
a-z (lower case)
A-Z (upper case)
0-9 (Numeric values)
, (Comma)
. (Period)
_ (Underscore)
- (dash)
(Space)— Excluded for URLEncode
Below is the sample code using encoding functions from AntiXSS library.
Secure Example 1:
1: <input type=text value=<%= AntiXss.HtmlAttributeEncode(Request.QueryString["name"]) %> ></input>
Secure Example 2 - Vulnerable Page.aspx
1: <head runat="server">
2: <title>Untitled Page</title>
3:
4: <script>
5: function fnEvil()
6: {
7: var id = <%= AntiXss.JavaScriptEncode(name)%>;
8: }
9: </script>
10:
11: </head>
In the above scenario user input is used in JavaScript context and AntiXss provides JavaScriptEncode method which uses \xSINGLE_BYTE_HEX and \uDOUBLE_BYTE_HEX notation to encode unsafe characters and also wraps the output in single quotes to make it a string.
Now considering the same input " '; alert(XSS);// " AntiXSS generates the following safe output.
1: <script type="text/javascript">
2: function fnEvil()
3: {
4: var name='\x3b alert\x28XSS\x29\x3b\x2f\x2f';
5: }
6: </script>
Above code sample demonstrates that the white-list approach of AntiXSS library basically provides superior protection by encoding everything except a small set of safe characters when compared against the classic HtmlEncode and UrlEncode utilities which encode only known bad items. I like AntiXSS because it looks for "good things" and not "bad things". :)
Thanks and more later...
Comments
- Anonymous
August 28, 2008
As promised, I am back sooner than you expected! and I know you are one of the two people who visit my - Anonymous
September 08, 2008
Anil Chintala here... I told you in my previous blog about AntiXSS Output Encoding methodology and why - Anonymous
September 09, 2008
Why does your example use HTML entity encoding inside a JavaScript block? This should use Javascript escaping. The spec defines a few special characters that have specific a encodings. All characters not known to be safe should use xHH or uHHHH format. - Anonymous
September 10, 2008
Thank You Jeff for pointing out the wrong method used in the sample code. I have corrected it now.Appreciate your input. - Anonymous
November 06, 2008
The comment has been removed