Search Tips and Tricks
I was troubleshooting an issue where some users have added links like https://www.microsoft.comdefault.aspx and there are errors in the crawl logs as there is a ‘/’ missing in the above URL. We had around 4 links like this and it is very difficult to find it by looking at the UI as the Portal is really huge with lots of site collections /sub sites. But it can be found easily from the SSP_Search database.
If you have a URL like https://, it will be crawled as Anchor Text and there should be an entry for the same in the MSSAnchortext table. Then ran the following query to find more about the URL.
select sourcedocid From mssanchortext where link like 'https://www.microsoft.com/%'
The above query returned the sourcedoicid for the URL. Now we need to look into the MSSCrawlURL table to see who is referrer or source. In the MSSCrawlURL table sourcedocid is mapped to DocId. So the following query will give you the source where the incorrect link is stored.
select * from msscrawlurl where docid = 1415
If you combine both the queries it will look like,
select * from msscrawlurl where docid in (select sourcedocid From mssanchortext where link like 'https://www.microsoft.com/%')