September 2009

Volume 24 Number 09

Extreme ASP.NET - Search Engine Optimization with ASP.NET 4.0, Visual Studio 2010 and IIS7

By Scott Allen | September 2009

Anyone with a public Web site knows that search engines play a key role in bringing visitors to the site. It's important to be seen by the search engines and rank highly in their query results. Higher rankings can bring you more visitors, which can lead to more paying customers and higher advertisement revenue. Search engine optimization (SEO) is the practice of fine-tuning a site to achieve higher rankings in search results. In this article, we'll take a look at SEO practices you can apply when using the latest Microsoft Web technologies.

SEO Basics

There are many factors in play when a search engine formulates the relative rank of your site, and some of the more important factors are not under your direct control. For example, we know search engines like to see incoming links to your site. An incoming link is a hyperlink on an outside domain that points into your domain. When a search engine sees many incoming links to a site, it assumes the site has interesting or important content and ranks the site accordingly. The SEO community describes this phenomenon using technical terms like "link juice" and "link love." The more "link juice" a site possesses, the higher the site will appear in search results.

If your site is interesting, then the rest of the world will naturally start adding links to your site. Because Visual Studio doesn't come with a "Make My Site More Interesting" button, you'll ultimately have to work hard at providing link-worthy content for the Web.

Once you have great content in place, you'll want to make sure the search engines can find and process your content. We don't know the exact algorithms used by search engines like Bing.com and Google. However, most search engines have published design and content guidelines you can follow to help boost your ranking. The Internet community has also compiled an extensive amount of knowledge acquired through experimentation, trial and error.

Here's the key: you want to think like a search engine. Search engines don't execute scripts or recognize the shapes in the images on your site. Instead, they methodically follow links to parse, index and rank the content they find in HTML. When thinking like a search engine, you'll focus on your HTML.

Quick and Valid HTML

Visual Studio has a long history in WYSIWYG development for both the desktop and the Web. The Web Forms designer allows you to drag and drop server controls on the design surface, and set values for controls in the Properties window. You can quickly create a Web page without ever seeing HTML. If you're focused on HTML, however, you'll want to work in the Source view window. The good news is you can work in the Source view without sacrificing speed or accuracy in Visual Studio 2010.

Visual Studio 2010 is going to ship with a number of HTML IntelliSense code snippets for creating common HTML tags and server-side controls using a minimal number of keystrokes. For example, when you are in the source view of an .aspx file, you can type img and then hit the Tab key to generate the markup shown in Figure 1. Only four keystrokes give you more than 20 of the characters you needed to type!

Notice how the editor highlights the src and alt values in Figure 1. When using code snippets, you can tab between highlighted areas and begin typing to overwrite the values inside. This feature is another productivity bonus that saves you the effort of navigating to the proper insertion point and manually deleting the existing value.

Both ASP.NET Web Forms and ASP.NET MVC projects will have HTML snippets available in Visual Studio 2010 to create everything from ActionLinks to XHTML DOCTYPE declarations. The snippets are extensible, customizable and based on the same snippet engine that has been available since Visual Studio 2005. See Lorenzo Minore's MSDN article for more details on snippets (msdn.microsoft.com/en-us/magazine/cc188694.aspx).


Figure 2 Validation Settings

Validation

Creating valid HTML is crucial if you want search engines to index your site. Web browsers are forgiving and will try to render a page with malformed HTML as best they can, but if a search engine sees invalid HTML, it may skip important content or reject the entire page.

Since there are different versions of the HTML specifications available, every page you deliver from your application should include a DOCTYPE element. The DOCTYPE element specifies the version of HTML your page is using. Web browsers, search engines and others tools will examine the DOCTYPE so that they know how to interpret your markup. Visual Studio will place a DOCTYPE in the proper locations when you create new Web form pages and master pages. The default DOCTPYE, as shown in the following code snippet, specifies that the page will comply with the XHTML 1.0 specification:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Although you don't need to know all the subtle differences between the HTML specifications, you do need to know if your page conforms to a specific DOCTYPE. Visual Studio has included HTML validation features since the 2005 release, and validation is on by default. You can control the validation settings and validation target type by navigating to the Tools | Options | Text Editor | HTML | Validation settings, as shown in Figure 2).

The "as warnings" setting means HTML validation problems will not stop your build, but will show as warnings in the Error window of Visual Studio. In the source view for Web forms, the text editor will draw your attention to HTML validation errors using squiggly lines. You can mouse over the element to see the exact error message, as we see in Figure 3.

Descriptive HTML

The img tag in Figure 3 is a good example of how you need to think like a search engine. As I said earlier, a search engine doesn't see or interpret the shapes and words in an image, but we can give the search engine some additional information about the graphical content by using the alt attribute. If the image is a company logo, your alt text might be "company logo," but it would be better to include the name of the company in the logo's alt text. A search engine will use the alt text as another clue in understanding the theme and essence of the page.

Search engines are always looking for these types of clues, and to the search engine some clues are more important than others. For example, we typically use header tags, like h1 tags, to make certain pieces of content standout in a page. Search engines will generally give more weight to a keyword inside of an h1 tag than if the same keyword were inside a normal paragraph. You'll want to make sure your h1 content is descriptive and uses keywords related to the theme of your page. A best practice for SEO work is to always include at least one H1 tag in every page.

If you look back at the headers I've chosen for this article, you'll see they revolve around phrases like "Valid HTML," "SEO Basics," and so on. These are all descriptive phrases that will give both the reader and the search engine a good idea of what the article is about.

Descriptive Titles and Metadata

Another important area for descriptive keywords is inside the head tag. The head section from one of the pages in the associated code download is shown here:

<head runat="server">
<title>Programming meta tags in ASP.NET 4.0</title>
<meta name="keywords" content="ASP.NET, ASP.NET 4.0, SEO, meta" />
<meta name="description" content=
"How to use Page.MetaKeywords and Page.MetaDescription in ASP.NET" />
</head>

The words inside the page title tag are heavily weighted, so you'll want to choose a good title. The head tag can also enclose meta tags. You'll want to use two meta tags for SEO work -- one to set the page's associated keywords and one to set the page's description. Visitors will generally not see this meta information, but some search engines do display the meta description of a page in search results. The meta keywords are another place to advertise the real meaning of your page by feeding the search engine important words to associate with the page.

If you are building dynamic content, or changing the title and meta data on a frequent basis, then you don't want to hard code this content in an .aspx file. Fortunately, Web Forms in ASP.NET 4.0 makes it easy to manipulate the title, keywords and description of a page from code-behind:

protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
Page.Title = "Programming meta tags in ASP.NET 4.0";
Page.MetaKeywords = "ASP.NET 4.0, meta, SEO, keywords";
Page.MetaDescription =
"How to use Page.Keywords and Page.Description in ASP.NET";
}
}

SEO No-Nos

Some sites attempt to game the Internet search engines by filling their pages with irrelevant keywords, too many keywords, or by duplicating keywords repeatedly. This practice (known as "keyword stuffing") is an attempt to gain a high search engine rank for specific search terms without providing useful content for real visitors. The unfortunate visitors who land on such a site are invariably disappointed because they don't find any content of substance, but the visit still counts as a hit when the site totals its advertising revenue.

Search engines try to detect deceptive behavior like keyword stuffing to protect the quality of their search results. You don't want a search engine to accidently categorize your site as misleading because you've used too many keywords in too many places. Search engine penalties can range from lowering the relative importance of a page in search results, to dropping the content of a site from the search index entirely.

Another practice to avoid is serving different content to a search engine crawler than you would serve to a regular visitor. Some sites will do this by sniffing the user agent header or IP address of an incoming request. Although you might be able to think of some useful features for this type of behavior, too many sites have used this technique to hide well-known malware and phishing content from search engines. If a search engine detects this behavior (known as cloaking), you'll be penalized. Stay honest, provide good content, and don't try to game or manipulate the search engine results.

You can see how to use the page's Title, MetaKeywords and MetaDescription properties during the Page_Load event. The Title property has been in ASP.NET since version 2.0, but MetaKeywords and MetaDescription are new in ASP.NET 4.0. Although we are still using hard-coded strings, you could load these property values from any data source. You could then allow someone in charge of marketing the Web site to tweak the meta data for the best search engine results, and they would not have to edit the source code for the page.

While effective keywords and descriptions might give you a bit of an edge in search engine results, content is still king. We'll return to see more HTML tips later in the article, but in the next couple of sections we'll see how URLs can play an important role in how your content is located and ranked.

Canonical URLs

Duplicate content generally presents a problem for search engines. For example, let's say the search engine sees a recipe for your famous tacos at two different URLs. Which URL should the search engine prefer and provide as a link in search results? Duplicate content is even more of an optimization problem when it comes to incoming links. If the "link love" for your taco recipe is distributed across two different URLs, then your famous taco recipe might not have the search engine ranking it deserves.

Unfortunately, you might be duplicating content without realizing it. If search engines can read your site from a URL with a www prefix and without a www prefix, they'll see the same content under two different URLs. You want both URLs to work, but you want just one URL to be the standard or canonical URL.

As an example, consider the Microsoft.com Web site. Both www.microsoft.com and microsoft.com will take you to the same content. But, watch closely if you go to the home page using microsoft.com. The Microsoft site will redirect your browser to www.microsoft.com. Microsoft uses redirection to enforce www.microsoft.com as its canonical URL.

Luckily, redirecting visitors to your canonical URL is easy with ASP.NET. All you need to do is provide some logic during the application pipeline's BeginRequest event. You can do this by implementing a custom HTTP module, or by using an Application_BeginRequest method in global.asax. Figure 4 is what the logic would look like for this feature.

The code in Figure 4 is using another new feature in ASP.NET 4.0 -- the RedirectPermanent method of the HttpResponse object. The traditional Redirect method in ASP.NET sends an HTTP status code of 302 back to the client. A 302 tells the client that the resource temporarily moved to a new URL, and the client should go to the new URL, just this once, to find the resource. The RedirectPermanent method sends a 301 code to the client. The 301 tells the client that the resource moved permanently, and it should look for the resource at the new URL for all future requests. Notice the call to RedirectPermanent also uses a new feature in C# 4.0 -- the named parameter syntax. Although this syntax isn't required for the method call, the named parameter syntax does make the intent of the parameter explicit.

With a redirect in place, both Web browsers and search engines should be using only your canonical URL. Your "link love" will consolidate and search engine rankings should improve.

Descriptive URLs

Figure 4 RedirectPermanent Method of the HttpResponse Object

void Application_BeginRequest(object sender, EventArgs e)
{
HttpApplication app = sender as HttpApplication;
if (app != null)
{
string domain = "www.odetocode.com";
string host = app.Request.Url.Host.ToLower();
string path = app.Request.Url.PathAndQuery;
if (!String.Equals(host, domain))
{
Uri newURL = new Uri(app.Request.Url.Scheme +
"://" + domain + path);
app.Context.Response.RedirectPermanent(
newURL.ToString(), endResponse: true);
}
}
}

In the January 2009 issue of MSDN Magazine I wrote about how to use the routing features of .NET 3.5 SP1 with ASP.NET Web Forms. As I said then, the clean and descriptive URLs you can achieve with routing are important to both users and search engines. Both will find more meaning in a URL like /recipes/tacos than they will in /recipe.aspx?category=40&topic=32. In the former, the search engine will consider "recipes" and "tacos" as important keywords for the resource. The problem with the latter URL is that many search engine crawlers don't work well when a URL requires a query string with multiple parameters, and the numbers in the query string are meaningless outside the application's backend database.

The ASP.NET team has added some additional classes to the 4.0 release that make routing with Web Forms easy. In the code
download for this article, I've re-implemented January's demo Web site with the new classes in ASP.NET 4.0. Routing begins by describing the routes your application will process during the Application_Start event. The following code is a RegisterRoutes method that the site invokes during the Application_Start event in global.asax:

void RegisterRoutes()
{
RouteTable.Routes.Add(
"Recipe",
new Route("recipe/{name}",
new PageRouteHandler("~/RoutedForms/RecipeDisplay.aspx",
checkPhysicalUrlAccess:false)));
}

URL Rewrite by Carlos Aguilar Mares

URL Rewrite for IIS 7.0 is a tool Microsoft makes available for download from iis.net/extensions/URLRewrite. This tool can perform all of the URL canonicalization work for you without requiring any code. The tool will do host header normalization, lowercasing and more (as described in this blog post: ruslany.net/2009/04/10-url-rewriting-tips-and-tricks/). The tool can also help you "fix" broken links by rewriting or redirecting using a map, so you don't need to even change your application/HTML. See: blogs.msdn.com/carlosag/archive/2008/09/02/IIS7UrlRewriteSEO.aspx

URL Rewrite can also do the "descriptive" URLs for any version of ASP.NET and its performance is far superior to any other
existing option, including ASP.NET routing, because the tool works with kernel-mode caching.

Figure 5 Get Name Parameter From RouteData to Display Information About a Recipe

private void DisplayRecipe()
{
var recipeName = RouteData.Values["name"] as string;
if (recipeName != null)
{
var recipe = new RecipeRepository().GetRecipe(recipeName);
if (recipe != null)
{
_name.Text = recipe.Name;
_ingredients.Text = recipe.Ingredients;
_instructions.Text = recipe.Instructions;
}
}
}

If you review my January article, you'll remember how every route must specify a route handler. In RegisterRoutes, we are setting the handler for the "Recipe" route to an instance of the new PageRouteHandler class in ASP.NET 4.0. The routing engine will direct any incoming request URLs in the form of recipe/{name} to this route handler, where {name} represents a route parameter the routing engine will extract from the URL.

A Web Form has access to all of the route parameters the routing engine extracts from the URL, via a RouteData property. This property is new for the Page class in 4.0. The code in Figure 5 will get the name parameter from RouteData and use the name to look up and display information about a recipe:

One of the great features of the routing engine is its bidirectional nature. Not only can the routing engine parse URLs to govern HTTP requests, but it can also generate URLs to reach specific pages. For example, if you want to create a link that will lead a visitor to the recipe for tacos, you can use the routing engine to generate a URL based on the routing configuration (instead of hard-coding the URL). ASP.NET 4.0 introduces a new expression builder you can use in your markup to generate URLs from the routing configuration table:

<asp:HyperLink NavigateUrl="<%$ RouteUrl:RouteName=recipe,name=tacos %>"
Text="Titillating Tacos" runat="server">
</asp:HyperLink>

The preceding code shows the new RouteUrl expression builder in action. This expression builder will tell the routing engine to generate a link for the route named "recipe" and include a name parameter in the URL with the value "tacos". The preceding markup will generate the following HTML:

<a href="/recipe/tacos">Titillating Tacos</a>

The preceding URL is friendly, descriptive, and optimized for a search engine. However, this example brings up a larger issue with ASP.NET. Server controls for Web Forms often abstract away the HTML they produce, and not all the server controls in ASP.NET are search engine friendly. It's time we return to talk about HTML again.

HTML Mistakes

If we created a link to the taco recipe using a LinkButton instead of a Hyperlink, we'd find ourselves with different markup in the browser. The code for the LinkButton and the HTML it generates is shown here:

<asp:LinkButton runat="server" Text="Tacos"
PostBackUrl="<%$ RouteUrl:RouteName=recipe,name=tacos %>">
</asp:LinkButton>
<!-- generates the following (excerpted): -->
<a href="javascript:WebForm_DoPostBackWithOptions(...)">Tacos</a>

We still have an anchor tag for the user to click on, but the anchor tag uses JavaScript to force the browser to postback to the server. The LinkButton renders this HTML in order to raise a server-side click event when the user clicks on the link. Unfortunately, JavaScript postback navigation and search engines don't work together. The link is effectively invisible to search engines, and they may never find the destination page.


Figure 6 IIS 7 Manager

Because server-side ASP.NET controls abstract away HTML, you have to choose your server controls wisely. If you want complete control over HTML markup in an ASP.NET environment, then you should consider using the ASP.NET MVC framework. Server controls are verboten when using the MVC framework, and the infrastructure and APIs are in place to work with only HTML markup.

If you are using ASP.NET Web Forms and are optimizing for search engines, you'll want to view the HTML source produced by server controls. Every Web browser will give you this option. In Internet Explorer, use the View -> Source command. Be careful with any control that renders a combination of HTML and JavaScript in navigational scenarios. For example, using a DropDownList with the AutoPostBack property set to true will require JavaScript to work. If you rely on the automatic postback to navigate to new content, you'll be making the content invisible to search engines.

Obviously, AJAX-heavy applications can present a problem for search engines. The UpdatePanel control and content generated by Web service calls from JavaScript are not friendly to search engines. Your safest approach for SEO work is to place content directly into your HTML to make it easily discoverable.

After you've tweaked your HTML, your keywords, and your URLs, how do you measure the results? Although your search engine ranking is the ultimate judge of your SEO effort, it would be nice if you could find any problems before a site goes live and a search engine crawls your pages. Although Visual Studio can tell you about HTML validation problems, it doesn't warn you about missing metadata and canonical URLs. This is the job of a new product -- the IIS SEO Toolkit.


Figure 7 Report Summary

The IIS SEO Toolkit

The IIS SEO Toolkit is a free download for IIS 7 and is available from iis.net/extensions/SEOToolkit. The toolkit includes a crawling engine that will index your local Web application just like a search engine, and provide you with a detailed site analysis report. The toolkit can also manage robots.txt and sitemap files. The robots file uses a standardized format to tell search engines what to exclude from indexing, while sitemap files can point search engines to content you want to include. You can also use sitemap files to tell the search engine the priority, rate of change and the date a resource changed.

For SEO work, the site analysis report is invaluable. The report will tell you everything about your site from the perspective of a search engine. After you've installed the toolkit, a Site Analysis option will appear for your sites in the IIS 7 Manager window, as shown in Figure 6.

Double-clicking the icon will take you to a list of previously run reports, with an Action option of running a new analysis. Running an analysis is as easy as pointing the tool to a local HTTP URL and clicking OK. When the analysis is finished, the toolkit will open a report summary, as Figure 7 shows.

The toolkit applies a number of rules and heuristics to make you aware of SEO- and performance-related problems. You can find broken links, missing titles, descriptions that are too short, descriptions that are too long and a host of other potential issues. The toolkit will analyze links and provide reports on the most linked pages, and the paths a visitor would need to follow to reach a specific page. The toolkit even provides a textual analysis of each page's content. You can use this analysis to find the best keywords for a page.

The IIS SEO Toolkit allows you to discover the SEO work you need to perform and to validate any SEO work you've already completed. At the time of writing, the toolkit is in a beta 1 release. You can expect that future versions will continue to add rules and analysis features, in addition to some intelligence that can automatically fix specific problems for you.

Easy and Effective

Even if you have the greatest content in the world, you need to make the content discoverable for search engines to bring you visitors. SEO is the practice of thinking like a search engine and making your site appeal to the crawlers and ranking algorithms. Visual Studio 2010 and ASP.NET 4.0 are introducing new features to make SEO work easier in .NET 4.0, while the IIS SEO Toolkit is a fantastic tool dedicated to making your site better for search engines. Using the three tools in combination can make your SEO work both easy and effective.


K. SCOTT ALLEN* is a member of the Pluralsight technical staff and founder of OdeToCode. You can reach Allen at scott@OdeToCode.com or read his blog at odetocode.com/blogs/scott.*