Using the Catalog Manager

The ISearchCatalogManager and ISearchCatalogManager2 interfaces provide methods to manage a search catalog, such as causing re-indexing or setting time-outs. While Windows Search currently uses only one catalog, this interface was designed to give you greater control for managing multiple catalogs independently. The interface manages the catalog in the following ways:

  • Access to other interfaces — retrieving other search-related interfaces required by the Crawl Scope Manager, data change notifications, and the ISearchQueryHelper interface.
  • Catalog contents — ensuring that new data is indexed and that other applications and components work properly by forcing a re-indexing of all or part of the catalog or by resetting the entire catalog.
  • Catalog properties — setting properties that determine how the catalog manages time-outs when connecting to protocol handlers and how diacritical marks are treated in searches.
  • Catalog status — getting information about the catalog, including status, size, and current activity state.

This topic is organized as follows:

Some useful interfaces in the Windows Search platform require an instance of the Catalog Manager before they can be used. To create a Catalog Manager for a specified catalog, call the ISearchManager::GetCatalog method. The methods of the Catalog Manager can then be used to instantiate and return interfaces that are based on the specified catalog.

Method Description
GetQueryHelper Gets an instance of the ISearchQueryHelper interface for the current catalog, to enable you to build queries easily.
GetCrawlScopeManager Gets an instance of ISearchCrawlScopeManager for this search catalog, to enable developers to modify the crawl scope of the Windows Search Indexer.
GetItemsChangedSink Gets an instance of the ISearchItemsChangedSink interface, which client applications use to notify the indexer of changes when the client wants indexing status information about the item to support provider-managed notifications. See Notifying the Index of Changes for more information.
GetPersistentItemsChangedSink Gets an instance of ISearchPersistentItemsChangedSink, which client applications use to notify the indexer of changes when the client does not want indexing status information (indexer-managed notifications). See Notifying the Index of Changes for more information.

Managing the Catalog Contents

There are two primary tasks involved in managing the catalog: re-indexing all or some of the URLs in the indexer's crawl scope, and resetting the entire underlying catalog. When you re-index URLs, old data remains in the catalog until or unless it is replaced by new data. When you reset the catalog, the entire catalog is rebuilt and all URLs in the crawl scope are re-indexed. This process can take a lot of time and should be used only as a last resort for solving issues such as a possibly corrupted index.

When you install a new application, protocol handler, or filter, the setup application should add its directory or root to the crawl scope to ensure that the indexer includes the location of that application's data. If the data does not appear in the catalog after the indexer has crawled its crawl scope, then you should first ensure that the location of the data is included in the crawl scope. You can add it by using the user interface for Windows Search options or the Crawl Scope Manager. If the location appears to be in the crawl scope, you can manually force a re-indexing of all URLs in the indexer's crawl scope or a subset, by using the following methods of the ISearchCatalogManager interface.

Re-indexing method Description
ISearchCatalogManager::Reindex Re-indexes all URLs in the catalog. The old information will remain until it is replaced by new information.
ISearchCatalogManager::ReindexMatchingURLs
ISearchCatalogManager::ReindexSearchRoot
Re-indexes URLs that match the pattern or start at a particular root (for example, file:///C:\Foldername\Subfoldername\). This is useful for recrawling everything in a particular directory or with a particular extension, as when an application is installed.
PrioritizeMatchingURLs Instructs the indexer to prioritize indexing items with URLs that match a specified pattern over completing other indexing tasks.

Resetting the Index. You can reset the entire index with a call to ISearchCatalogManager::Reset. This resets the underlying catalog by rebuilding the databases and performing a full index of all URLs in the crawl scope. This process can take a lot of time and should be used only as a last resort for solving issues such as a possibly corrupted index.

Important

Because of the slowdown in indexing that these methods can cause, they should be used carefully when you are trying to identify indexing or catalog problems. First, make sure your search roots and scope rules are added in the Crawl Scope Manager, and then make sure that the FANCI bit (File Attribute Not Content Indexed) is set properly for files and folders. If you have confirmed that these are correct, try ReindexSearchRoot first and Reindex last. If neither of these work, then try Reset as a last resort.

For related information, see Notifying the Index of Changes, and Querying the Index with ISearchQueryHelper

Managing Catalog Status

The Catalog Manager can be used to get the status of the catalog for applications that want to customize how the catalog is managed (for example, a custom "Catalog Status" monitoring application). But the Catalog Manager is not typically required for most search-related development scenarios. Common uses would be for a "Catalog Status" monitoring application or a Control Panel-style application.

The following table describes the methods of ISearchCatalogManager used for managing catalog status.

Method Description
URLBeingIndexed Gets the URL that is currently being indexed. This method would be useful if you were trying to identify whether the indexer was "stuck" on an item.
NumberOfItems Gets the number of items in the catalog.
NumberOfItemsToIndex Retrieves the following information about items to be indexed:
  • plIncrementalCount - the number of items to be indexed in the next incremental index
  • plNotificationQueue - the number of items in the notification queue. This information would be useful to a notification application that needed to check whether the indexer is receiving the notifications that the application is sending.
  • plHighPriorityQueue - the number of items in the high-priority queue. Items in the plHighPriorityQueue are indexed first.
GetCatalogStatus Gets the status of the catalog and returns an enumeration value that gives the current status. The following are possible catalog states:
  • Idle: No indexing is needed.
  • Paused: Indexing is paused (due to low battery or high CPU usage, for example).
  • Recovering: Indexing is recovering.
  • Full crawl: Indexer is performing a full crawl of the crawl scope.
  • Incremental crawl: Indexer is performing an incremental crawl.
  • Processing notifications: Indexer is processing notifications.
  • Shutting down: Indexer is shutting down.
get_Name Gets the name of the current catalog that is specified in the ISearchManager::GetCatalog method. Currently, the only catalog supported is SystemIndex.

Managing Catalog Properties

There are three catalog properties that you can manage with the Catalog Manager:

  • Diacritic sensitivity. Diacritics are accent marks added to letters to signify a word's meaning or pronunciation. This property determines whether the catalog is sensitive to diacritics, and is important when you or your users search and index text in multiple languages. For example, with this property set to FALSE, the catalog would treat "resume" and "resumé" as if they were the same word.
  • Connection timeouts. This property represents the amount of time to wait for a connection response from a server or data store, as represented in a TIMEOUT_INFO structure. You can use this property to fine tune Windows Search.
  • Data timeouts This property represents the amount of time to wait for a data transaction between the indexer and a protocol handler or filter, as represented in a TIMEOUT_INFO structure. If this time has elapsed, the process from the Filter Daemon is terminated to prevent deadlock and other resource problems.

The last two properties are intended primarily for future use. Each of these properties has get and put methods.

Method Description
get_DiacriticSensitivity /
put_DiacriticSensitivity
TRUE if the catalog should differentiate words with diacritics. FALSE if the catalog should ignore diacritics. Changing this property requires rebuilding the index because the keys of the index may become invalid.
get_ConnectTimeout /
put_ConnectTimeout
The time, in seconds, that the indexer should wait for a connection response from a server or data store. Setting this too high can cause delays if many sites are unresponsive. Setting it too low can result in some sites not being crawled.
get_DataTimeout /
put_DataTimeout
The time, in seconds, that the indexer should wait for a data transaction.

Running in Elevated Mode

Any method calls that update the SystemIndex require your application to be run elevated. Otherwise, your application will fail with an Access Denied error.

Managing the Index

Interfaces for Managing the Index

Using the Search Manager

Using the Crawl Scope Manager