Revisiting My Classic ASP and URL Rewrite for Dynamic SEO Functionality Examples
Last year I wrote a blog titled Using Classic ASP and URL Rewrite for Dynamic SEO Functionality , in which I described how you could combine Classic ASP and the URL Rewrite module for IIS to dynamically create Robots.txt and Sitemap.xml files for your website, thereby helping with your Search Engine Optimization (SEO) results. A few weeks ago I had a follow-up question which I thought was worth answering in a blog post.
Overview
Here is the question that I was asked:
"What if I don't want to include all dynamic pages in sitemap.xml but only a select few or some in certain directories because I don't want bots to crawl all of them. What can I do? "
That's a great question, and it wasn't tremendously difficult for me to update my original code samples to address this request. First of all, the majority of the code from my last blog will remain unchanged - here's the file by file breakdown for the changes that need made:
Filename | Changes |
---|---|
Robots.asp | None |
Sitemap.asp | See the sample later in this blog |
Web.config | None |
So if you are already using the files from my original blog, no changes need to be made to your Robot.asp file or the URL Rewrite rules in your Web.config file because the question only concerns the files that are returned in the the output for Sitemap.xml.
Updating the Necessary Files
The good news it, I wrote most of the heavy duty code in my last blog - there were only a few changes that needed to made in order to accommodate the requested functionality. The main difference is that the original Sitemap.asp file used to have a section that recursively parsed the entire website and listed all of the files in the website, whereas this new version moves that section of code into a separate function to which you pass the unique folder name to parse recursively. This allows you to specify only those folders within your website that you want in the resultant sitemap output.
With that being said, here's the new code for the Sitemap.asp file:
<%
Option Explicit
On Error Resume Next
Response.Clear
Response.Buffer = True
Response.AddHeader "Connection", "Keep-Alive"
Response.CacheControl = "public"
Dim strUrlRoot, strPhysicalRoot, strFormat
Dim objFSO, objFolder, objFile
strPhysicalRoot = Server.MapPath("/")
Set objFSO = Server.CreateObject("Scripting.Filesystemobject")
strUrlRoot = "https://" & Request.ServerVariables("HTTP_HOST")
' Check for XML or TXT format.
If UCase(Trim(Request("format")))="XML" Then
strFormat = "XML"
Response.ContentType = "text/xml"
Else
strFormat = "TXT"
Response.ContentType = "text/plain"
End If
' Add the UTF-8 Byte Order Mark.
Response.Write Chr(CByte("&hEF"))
Response.Write Chr(CByte("&hBB"))
Response.Write Chr(CByte("&hBF"))
If strFormat = "XML" Then
Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf
Response.Write "<urlset xmlns=""https://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf
End if
' Always output the root of the website.
Call WriteUrl(strUrlRoot,Now,"weekly",strFormat)
' Output only specific folders.
Call ParseFolder("/marketing")
Call ParseFolder("/sales")
Call ParseFolder("/hr/jobs")
' --------------------------------------------------
' End of file system loop.
' --------------------------------------------------
If strFormat = "XML" Then
Response.Write "</urlset>"
End If
Response.End
' ======================================================================
'
' Recursively walks a folder path and return URLs based on the
' static *.html files that it locates.
'
' strRootFolder = The base path for recursion
'
' ======================================================================
Sub ParseFolder(strParentFolder)
On Error Resume Next
Dim strChildFolders, lngChildFolders
Dim strUrlRelative, strExt
' Get the list of child folders under a parent folder.
strChildFolders = GetFolderTree(Server.MapPath(strParentFolder))
' Loop through the collection of folders.
For lngChildFolders = 1 to UBound(strChildFolders)
strUrlRelative = Replace(Mid(strChildFolders(lngChildFolders),Len(strPhysicalRoot)+1),"\","/")
Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative))
' Loop through the collection of files.
For Each objFile in objFolder.Files
strExt = objFSO.GetExtensionName(objFile.Name)
If StrComp(strExt,"html",vbTextCompare)=0 Then
If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then
Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat)
End If
End If
Next
Next
End Sub
' ======================================================================
'
' Outputs a sitemap URL to the client in XML or TXT format.
'
' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never
' tmpStrFormat = TXT|XML
'
' ======================================================================
Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat)
On Error Resume Next
Dim tmpDate : tmpDate = CDate(tmpLastModified)
' Check if the request is for XML or TXT and return the appropriate syntax.
If tmpStrFormat = "XML" Then
Response.Write " <url>" & vbCrLf
Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf
Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf
Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf
Response.Write " </url>" & vbCrLf
Else
Response.Write tmpStrUrl & vbCrLf
End If
End Sub
' ======================================================================
'
' Returns a string array of folders under a root path
'
' ======================================================================
Function GetFolderTree(strBaseFolder)
Dim tmpFolderCount,tmpBaseCount
Dim tmpFolders()
Dim tmpFSO,tmpFolder,tmpSubFolder
' Define the initial values for the folder counters.
tmpFolderCount = 1
tmpBaseCount = 0
' Dimension an array to hold the folder names.
ReDim tmpFolders(1)
' Store the root folder in the array.
tmpFolders(tmpFolderCount) = strBaseFolder
' Create file system object.
Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject")
' Loop while we still have folders to process.
While tmpFolderCount <> tmpBaseCount
' Set up a folder object to a base folder.
Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1))
' Loop through the collection of subfolders for the base folder.
For Each tmpSubFolder In tmpFolder.SubFolders
' Increment the folder count.
tmpFolderCount = tmpFolderCount + 1
' Increase the array size
ReDim Preserve tmpFolders(tmpFolderCount)
' Store the folder name in the array.
tmpFolders(tmpFolderCount) = tmpSubFolder.Path
Next
' Increment the base folder counter.
tmpBaseCount = tmpBaseCount + 1
Wend
GetFolderTree = tmpFolders
End Function
%>
It should be easily seen that the code is largely unchanged from my previous blog.
In Closing...
One last thing to consider, I didn't make any changes to the Robots.asp file in this blog. But that being said, when you do not want specific paths crawled, you should add rules to your Robots.txt file to disallow those paths. For example, here is a simple Robots.txt file which allows your entire website:
# Robots.txt
# For more information on this file see:
# https://www.robotstxt.org/
# Define the sitemap path
Sitemap: https://localhost:53644/sitemap.xml
# Make changes for all web spiders
User-agent: *
Allow: /
Disallow:
If you were going to deny crawling on certain paths, you would need to add the specific paths that you do not want crawled to your Robots.txt file like the following example:
# Robots.txt
# For more information on this file see:
# https://www.robotstxt.org/
# Define the sitemap path
Sitemap: https://localhost:53644/sitemap.xml
# Make changes for all web spiders
User-agent: *
Disallow: /foo
Disallow: /bar
With that being said, if you are using my Robots.asp file from my last blog, you would need to update the section of code that defines the paths like my previous example:
Response.Write "# Make changes for all web spiders" & vbCrLf
Response.Write "User-agent: *" & vbCrLf
Response.Write "Disallow: /foo" & vbCrLf
Response.Write "Disallow: /bar" & vbCrLf
I hope this helps. ;-]
Comments
- Anonymous
June 24, 2016
Dear RobmcmThanks for your script that i have applyed on my site.I am planning to rewrite it in mcv c#, but in the meantime i would like to continue to use this site that is written in asp.My problem is related to dynamic friendly SEO URL, in other words i would like to make appear a shorter link in the top bar.Analysing your script i guess it should not be difficult, for you, because i believe it is a matter of webconfig.How can i change your web config script to make it happen?I need to conver it from static to dynamic .. i already have the asp links on the robot and sitemap, therefore it should be a matter of passing a variable in the webconfig..Can you help me?Thanks