Use crawl rules to determine what content gets crawled (Search Server 2008)
Applies To: Microsoft Search Server 2008
Topic Last Modified: 2009-08-10
Note
Unless otherwise noted, the information in this article applies to both Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.
In this article:
Create a crawl rule
Edit a crawl rule
Delete a crawl rule
Reorder crawl rules
Before you perform these procedures, confirm that:
- You have read the topic Limit or increase the quantity of content that is crawled (Search Server 2008).
Important
You must be a search services administrator to perform the procedures in this article. For more information, see Add or remove a search services administrator (Search Server 2008).
You can create new crawl rules or edit existing crawl rules to determine what content gets crawled. You can also reorder crawl rules to specify the order in which these rules are applied.
Create a crawl rule
Use the following procedure to create a crawl rule.
Create a crawl rule
On the Search Administration page, in the Crawling section, click Crawl rules.
On the Manage Crawl Rules page, click New Crawl Rule.
On the Add Crawl Rule page, in the Path section, in the Path box, type the path affected by this rule. You can use standard wildcard characters in the path. For example:
http://server1/folder* contains all Web resources with a URL that starts with http://server1/folder.
*://*.txt includes every document with the txt file extension.
In the Crawl Configuration section, select one of the following:
Exclude all items in this path. Select this option if you want all items in the specified path to be excluded from the crawl.
Include all items in this path. Select this option if you want all items in the path to be crawled.
If you chose to exclude all items in this path, skip to step 7. Otherwise, you can further refine the inclusion by selecting any combination of the following:
Follow links on the URL without crawling the URL itself. Select this option if you want to crawl links contained within the URL, but not the URL itself.
Crawl complex URLs (URLs that contain a question mark (?)). Select this option if you want to crawl URLs that contain parameters that use the question mark (?) notation.
Crawl SharePoint content as HTTP pages. Normally, SharePoint content is crawled by using a special protocol. Select this option if you want SharePoint content to be crawled as HTTP pages instead. When the content is crawled by using the HTTP protocol, item permissions are not stored. This means that all items that match a particular search query appear on search results pages, regardless of whether the user that initiated the query has access to those items.
The purpose of this setting is to enable search administrators to crawl remote SharePoint sites that they do not have explicit control over and therefore cannot enforce that the domain account used to crawl those remote sites has been granted full-read permissions on those sites.
Note
For information about the settings in the Specify Authentication section, see Use crawl rules to specify a different content access account or authentication method (Search Server 2008)
Click OK.
Repeat steps 2 through 5 for each new crawl rule you want to create.
Edit a crawl rule
You can edit an existing crawl rule at any time by clicking it, and then making the necessary changes to the path and configuration, as described in the previous procedure.
Note
This will require a full crawl of the content impacted by the altered crawl rule.
Delete a crawl rule
Use the following procedure to delete a crawl rule that is no longer needed.
Delete a crawl rule
On the Shared Services Administration page, in the Search section, click Search settings.
On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.
On the Manage Crawl Rules page, point to the crawl rule that you want to delete, click the arrow that appears, and then click Delete on the menu that appears.
Click OK to confirm the deletion.
Note
This will require a full crawl of the content impacted by the deleted crawl rule.
Reorder crawl rules
After you create new crawl rules, we recommend that you specify the order in which you want the rules to be applied while content is being crawled. Crawl rules are applied in the order in which they are listed. Therefore, if two rules cover the same or overlapping content, the first rule that is listed is applied. Use the following procedure to specify the order of your crawl rules.
Reorder crawl rules
On the Shared Services Administration page, in the Search section, click Search settings.
On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.
On the Manage Crawl Rules page, in the Order column in the list of crawl rules, select a value in the list that specifies the position you want the rule to occupy. Other values are shifted accordingly.
You can also use a global exclusion rule, which applies regardless of the order in which it is listed. For more information about administering crawl rules, see the Administrating Crawl Rules section in the following resource: Book Excerpt - Chapter 16 Enterprise search and indexing architecture and administration.
Note
This will require a full crawl of the content that is affected by the repositioned crawl rule.