Tony McIntyre and Crawl Performance and a quick comment on Incremental Crawls

Keith just pointed out that another "SharePoint Legend" at Microsoft has started blogging, Tony, welcome to the SharePoint Blogosphere!

His opening post is a good one, and an excellent compliment to a post Keith did earlier here.

One thing I will just throw in quickly, I have often heard customers ask why their "Incremental Crawls" are not that much faster than their Full Crawls. Well, the reason is in the way SharePoint actually determines whether or not a document has changed and should therefore be included in the incremental crawl. It does it by performing a "hash" of the document, if the "hash" is different between crawls, then so is the document. Unfortunately, in order to do the hash, SharePoint first has to download the document. This actually takes a significant amount of time (especially over slow links), and therefore you don't necessarily save as much time as you might have first thought. anyway, small point to keep in mind....

Anyway, debugging crawl performance is one of the most difficult tasks facing SharePoint Administrators. The difficulty comes from many angles; there is LOTS going on, you often dont get any errors (ie. its just slow), there isn't really a UI on the actual crawl (its not interactive) and you only really know soemthing is wrong after it has all happened.

Both Keith and Tony's post give you some great information to get started on issues to do with crawl performance, with the counters in place it kind of gives you a pair of special "SharePoint glasses" that can help you watch a crawl actually happening....