Condividi tramite


Manage the crawling impact of the FAST Search specific connectors(informazioni in lingua inglese)

Data pubblicazione: 12 maggio 2010

This topic explains how to manage the crawling impact of the indexing connectors that are specific to FAST Search Server 2010 for SharePoint: the Connettore di database di FAST Search, the Connettore Lotus Notes di FAST Search, and the Crawler Web di FAST Search.

Managing the impact of the FAST Search database connector

The Connettore di database di FAST Search itself does not have a throttling mechanism to reduce the impact of a crawl on the source database or on the search engine.

However, it is unlikely that the Connettore di database di FAST Search will put a significant load on the source database. The connector is able to extract database rows from the source database at a much higher rate than that the search engine is able to index them. Only when the SELECT query used by the Connettore di database di FAST Search is very complex there may be an increased load on the database.

In some types of SQL/JDBC implementations, selecting a large dataset can cause a large part of the memory to be allocated to the Connettore di database di FAST Search. In some cases, the complete dataset will be transferred to the Connettore di database di FAST Search client before it can be processed. You can avoid this by having a server-side cursor. If you are using SQL Server, add “;selectMethod=cursor” to the JDBCURL parameter in Connettore di database di FAST Search configuration file. Note that this will increase the memory consumption on the database server side, since the result set is held in memory there until it is transferred to the connector.

Managing the impact of the FAST Search Lotus Notes connector

A single instance of the Connettore Lotus Notes di FAST Search is typically able to extract documents from the Domino server significantly faster than the search engine is capable of indexing them. Because of this, the load on the Domino server is not likely to significantly increase when the Connettore Lotus Notes di FAST Search is running.

However, the connector automatically slows down the extraction rate to match the feed rate to the search engine. This means that the Connettore Lotus Notes di FAST Search could increase the load on the search engine backend.

To throttle the extraction rate from the Domino server, you can configure the AdapterThrottleSleepMS parameter in the Connettore di contenuto Lotus Notes di FAST Search configuration file. The value of this parameter, located in the ConnectorExecution group, sets how many milliseconds the connector should sleep between each Note it extracts from the Domino Server.

Note that the sleep interval counts for each adapter thread in the connector. If you use multiple adapter threads, specified in the configuration parameter ConnectorExecution/NumAdapters, the maximum extraction rate for the connector increases. For example: you set the parameter ConnectorExecution/AdapterThrottleSleepMS to 200. This means that each adapter thread can extract a maximum of 5 documents per second from Domino. If you use the default value for the parameter ConnectorExecution/NumAdapters, which is 3, the maximum number of documents that the connector can extract per second is 15. The actual number will be lower since, in addition to completing the sleep interval, the extraction itself takes some time as well.

Managing the impact of the FAST Search Web crawler

While configuring the Crawler Web di FAST Search, make sure to avoid potentially overloading Web servers. There are a number of settings that can be considered in order to reduce the load of the Crawler Web di FAST Search.

Each Node Scheduler crawls a number of Web sites (and servers) at the same time, as configured by the max_sites setting. A request will be issued every number of seconds to these Web sites and servers, as configured in the delay setting, with a maximum number of requests pending to any server at any time as configured in the max_pending setting. This combination of settings can result in a significant load, especially if multiple Web sites are hosted on the same server.

Enabling JavaScript support (with the use_javascript setting) causes an additional load, as both JavaScript and CSS dependencies have to be downloaded. It is not uncommon for a Web item to refer around ten to thirty external JavaScripts. In order to improve performance on Web items that contain many external JavaScripts, the Crawler Web di FAST Search will by default use a request delay of 0 seconds for these dependencies. This delay can be increased using the javascript_delay setting. If you increase the javascript_delay setting, make sure that you adjust the processing timeouts in the Browser Engine too, to avoid timeouts while it is downloading dependencies.

Cronologia delle modifiche

Data Descrizione Motivo

12 maggio 2010

Pubblicazione iniziale