The VB version of the Blog Crawler
This is the VB.Net 2005 version of the Blog Crawler. It’s based on the Foxpro version, but.it uses SQL Server Everywhere so you can deploy it on your mobile device! It crawls a blog and stores all entries into a SQL Server Everywhere table. This includes blog comments and Cascading Style Sheets.
I had to wait to post this blog entry because SQL Everywhere CTP public release is today (announced at Tech Ed)!
To run it, you only need to copy a few files from this link (1.6 megabytes) into a directory on your machine and start BlogCrawl.Exe. There is no registration or install of any kind required, except the Net Framework 2.0 (which is installed with Visual Studio 2005, or you can download the runtime). The Source code can be unzipped into the same folder and is here. The program (including SQL /E) is totally isolated to the install folder, except for the My Settings XML file which stores your preferences in your local settings folder. It doesn’t touch your registry or install any other files.
When you start the program, the top part shows a grid of already crawled blog posts. The bottom part shows each post in a web control as it looked at the time of download. The links on the page are live. When first starting, there will be no data. If you click the Crawl button, it will start a background thread that scans the blog and downloads any entries that have not been downloaded yet. The status bar shows crawl progress.
It takes about 20 minutes to crawl my blog and download my 240 posts. You can stop and continue the background thread at any time by hitting the same Crawl button. The data is stored as a SQL Mobile database in the same folder in a file called <blogname>.sdf.
You can type a search string in the textbox and click the Search button to limit the number of records in the grid to those blogs containing the search string.
It’s customized for blogs hosted on https://blogs.msdn.com for parsing out the blog entry publication date and determining what page is a blog post and what is just an intermediate page (like February posts). I haven’t tested it with all the various blog CSS styles, but the source can be modified.
The program defaults to crawling my blog, but allows you to switch to other blogs. Click the Blog Options button to crawl your favorite blog.
If you change the Followed value for a particular entry to 0, then the next crawl will recrawl that link, perhaps if you want to get the latest comments.
It uses the new MySettings feature to persist user settings, such as window position and which blog was last crawled. The new SplitContainer class allows you to move the splitter bar between the grid and the web control and the SplitterDistance is persisted in My.Settings.
One of my machines was playing a sound while my web crawler was crawling. The culprit was Control Panel->Sounds->Sound->Windows Explorer->Information Bar.
See also
Use Regular Expressions to get hyperlinks in blogs
Comments
Anonymous
June 13, 2006
Calvin,
I've been talking to Steve Lasker and other members of the SQL Everywhere team at TechEd. They mentioned that they do not support ODBC. What are the implications for using existing SQL Pass-Through code against SQL Everywhere?
++AlanAnonymous
June 15, 2006
Cool app, Calvin!
I noticed mine takes a bit longer than 20 minutes, should I file a bug somewhere? :-)Anonymous
June 15, 2006
I’ve had several requests that require customizing the Blog Crawler.
&nbsp;
The entire source code...Anonymous
June 20, 2006
The comment has been removedAnonymous
July 06, 2006
Why not use SQL express? :)Anonymous
July 06, 2006
The comment has been removedAnonymous
July 10, 2006
Hey Calvin, thanks for all the fixes you made to support http://msmvps.com/blogs/coad! This is a very handy tool and something I've been wishing for a long time. Again, thank you!Anonymous
July 10, 2006
I&rsquo;ve been looking for awhile for a way to back up my blog by capturing each post in a nice, MHTML...Anonymous
July 10, 2006
Check out this awesome little utility I found. Its not something I would use all the time, but I you...Anonymous
July 11, 2006
Calvin has written a blog crawler with both VFP and VB.NET versions that allows you to back up your own...Anonymous
July 18, 2006
My prior post (Create a .Net UserControl that calls a web service that acts as an ActiveX control to...Anonymous
July 18, 2006
PingBack from http://microsoft.wagalulu.com/2006/07/18/use-a-different-kind-of-grid-in-your-applications/Anonymous
June 18, 2007
PingBack from http://business-source.info/business-analyst-ecommerce-eaton-eden-prairie-mn-37/Anonymous
June 19, 2007
PingBack from http://business-source.info/the-webs-best-interface-design-38/Anonymous
October 01, 2007
Windows Mobile 5.0 comes with a Web Browser (v6 is due out any day now). It runs on Pocket PCs and SmartPhones.Anonymous
April 03, 2008
My prior post ( Create your own Test Host using XAML to run your unit tests ) shows how to create a formAnonymous
May 15, 2008
I received a question: Simply, is there a way of interrupting a vfp sql query once it has started shortAnonymous
November 16, 2010
Hi Calvin... Is there any way I can view the source code?Anonymous
January 21, 2011
The comment has been removed