Share via


Get a Web Page's Title from a URL (C#)

I was creating an app that saves URLs copied to the clipboard into an XML file.  This little bit of code came in handy so I thought I'd be worth sharing.

This code checks to make sure the URL is to a valid HTML page by first checking the type of request, then checking the header of the page.  If it is an HTML page, then the page is downloaded and a regular expression is used to pull out the <title> contents.

Example Code: GetWebPageTitle.zip

Uses namespaces: System.Net, System.Collections.Generic, System.Text.RegularExpressions

public static string GetWebPageTitle(string url)

{

   // Create a request to the url

   HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;

  

   // If the request wasn't an HTTP request (like a file), ignore it

   if (request == null) return null;

   // Use the user's credentials

   request.UseDefaultCredentials = true;

   // Obtain a response from the server, if there was an error, return nothing

   HttpWebResponse response = null;

   try { response = request.GetResponse() as HttpWebResponse; }

   catch (WebException) { return null; }

   // Regular expression for an HTML title

   string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";

   // If the correct HTML header exists for HTML text, continue

   if (new List<string>(response.Headers.AllKeys).Contains("Content-Type"))

      if (response.Headers["Content-Type"].StartsWith("text/html"))

      {

         // Download the page

         WebClient web = new WebClient();

         web.UseDefaultCredentials = true;

         string page = web.DownloadString(url);

         // Extract the title

         Regex ex = new Regex(regex, RegexOptions.IgnoreCase);

         return ex.Match(page).Value.Trim();

      }

   // Not a valid HTML page

   return null;

}

Comments

  • Anonymous
    February 19, 2007
    The comment has been removed

  • Anonymous
    February 19, 2007
    Handy, but a bit expensive; it requires downloading of the entire page in order to get at something usually at the start of the HTML.

  • Anonymous
    February 19, 2007
    The comment has been removed

  • Anonymous
    February 19, 2007
    Peter, You're right, this piece of code is obviously not designed for performance critical scenarios, but it is easy and works well for my client-side apps.  If performance was a concern, one could combine it into a single request that downloads byte by byte and uses text parsing (instead of a RegEx) as the file is downloaded to look at the header and body and stop downloading after the wrong header or </title> tag is found.

  • Anonymous
    March 02, 2007
    It's too bad everyone gripes about hte expense.  If it does what you want then use it.  If not or you have a better solution, back it up with the code for the rest of us to look at.

  • Anonymous
    May 22, 2008
    Thanks for regular expression : @"(?<=<title.>)([sS])(?=</title>)"; That's what I wanted.... Thanks

  • Anonymous
    January 06, 2009
    No really, I think that if you look more you will get a free version that does the same thing. Just try more

  • Anonymous
    November 24, 2009
    thanx for ur ,,, can u help me to optain the title of a live score web site , in which the title changes when the page gets refeshed

  • Anonymous
    March 28, 2010
    My homegrown bookmark utility didn't work with Chrome because it doesn't use the "FileGroupDescriptor" like FF and IE do. Your code just fixed that! That it takes two requests is a small price to pay for being able to Chrome! Thank you very much.

  • Anonymous
    May 01, 2010
    Hi! I tried to use your code in my app and it works mostly, but when I try to get the title of a webpage like http://www.arabic-keyboard.org/ it shows this: "Arabic Keyboard ™ لوحة المفاتيح العربية". I tried using System.Web.HttpUtility.HtmlDecode, but it still doesn't work. Is it in the encoding or something else? Does anyone know a solution to this problem? Thanks!

  • Anonymous
    September 08, 2010
    hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();

  • Anonymous
    September 08, 2010
    hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();

  • Anonymous
    September 07, 2011
    Nice, was looking exactly for this.