Get a Web Page's Title from a URL (C#)

Article
02/19/2007

I was creating an app that saves URLs copied to the clipboard into an XML file. This little bit of code came in handy so I thought I'd be worth sharing.

This code checks to make sure the URL is to a valid HTML page by first checking the type of request, then checking the header of the page. If it is an HTML page, then the page is downloaded and a regular expression is used to pull out the <title> contents.

Example Code: GetWebPageTitle.zip

Uses namespaces: System.Net, System.Collections.Generic, System.Text.RegularExpressions

public static string GetWebPageTitle(string url)

{

// Create a request to the url

HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;

// If the request wasn't an HTTP request (like a file), ignore it

if (request == null) return null;

// Use the user's credentials

request.UseDefaultCredentials = true;

// Obtain a response from the server, if there was an error, return nothing

HttpWebResponse response = null;

try { response = request.GetResponse() as HttpWebResponse; }

catch (WebException) { return null; }

// Regular expression for an HTML title

string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";

// If the correct HTML header exists for HTML text, continue

if (new List<string>(response.Headers.AllKeys).Contains("Content-Type"))

if (response.Headers["Content-Type"].StartsWith("text/html"))

{

// Download the page

WebClient web = new WebClient();

web.UseDefaultCredentials = true;

string page = web.DownloadString(url);

// Extract the title

Regex ex = new Regex(regex, RegexOptions.IgnoreCase);

return ex.Match(page).Value.Trim();

}

// Not a valid HTML page

return null;

}

Comments

Anonymous
February 19, 2007
The comment has been removed
Anonymous
February 19, 2007
Handy, but a bit expensive; it requires downloading of the entire page in order to get at something usually at the start of the HTML.
Anonymous
February 19, 2007
The comment has been removed
Anonymous
February 19, 2007
Peter, You're right, this piece of code is obviously not designed for performance critical scenarios, but it is easy and works well for my client-side apps. If performance was a concern, one could combine it into a single request that downloads byte by byte and uses text parsing (instead of a RegEx) as the file is downloaded to look at the header and body and stop downloading after the wrong header or </title> tag is found.
Anonymous
March 02, 2007
It's too bad everyone gripes about hte expense. If it does what you want then use it. If not or you have a better solution, back it up with the code for the rest of us to look at.
Anonymous
May 22, 2008
Thanks for regular expression : @"(?<=<title.>)([sS])(?=</title>)"; That's what I wanted.... Thanks
Anonymous
January 06, 2009
No really, I think that if you look more you will get a free version that does the same thing. Just try more
Anonymous
November 24, 2009
thanx for ur ,,, can u help me to optain the title of a live score web site , in which the title changes when the page gets refeshed
Anonymous
March 28, 2010
My homegrown bookmark utility didn't work with Chrome because it doesn't use the "FileGroupDescriptor" like FF and IE do. Your code just fixed that! That it takes two requests is a small price to pay for being able to Chrome! Thank you very much.
Anonymous
May 01, 2010
Hi! I tried to use your code in my app and it works mostly, but when I try to get the title of a webpage like http://www.arabic-keyboard.org/ it shows this: "Arabic Keyboard ™ Ů„ŮŘŘ© Ř§Ů„Ů…ŮŘ§ŘŞŮŠŘ Ř§Ů„ŘąŘ±Ř¨ŮŠŘ©". I tried using System.Web.HttpUtility.HtmlDecode, but it still doesn't work. Is it in the encoding or something else? Does anyone know a solution to this problem? Thanks!
Anonymous
September 08, 2010
hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();
Anonymous
September 08, 2010
hi, nice example. I noticed however, that a repeated call won't come back. At least in my config with .net 4. Closing the response resolves this issue ;) --> response.Close();
Anonymous
September 07, 2011
Nice, was looking exactly for this.

Share via

Get a Web Page's Title from a URL (C#)

Comments

Additional resources