Share via


Using the Catalog Manager

The ISearchCatalogManager interface provides methods to manage a search catalog, such as causing a reindex or setting timeouts. While Windows Search currently uses only one catalog, this interface was designed to give you greater control to managed a number of catalogs independently from one another. The interface manages the catalog in four areas:

  • Access to other interfaces - retrieve other search-related interfaces required by the Crawl Scope Manager, data change notifications, and the ISearchQueryHelper interface.
  • Catalog contents - ensure that new data gets indexed and that other applications and components work properly by forcing a reindex of all or part of the catalog or by resetting the entire catalog.
  • Catalog properties - set properties that determine how the catalog manages timeouts when connecting to protocol handlers and how diacritical marks are treated in searches.
  • Catalog status - get information about the catalog including status, size, and current activity state.

This topic contains the following sections:

  • Accessing Related Interfaces
  • Managing the Catalog Contents
  • Managing Catalog Status
  • Managing Catalog Properties
  • Related Topics

 

There are a number of useful interfaces in the Windows Search platform which require an instance of the Catalog Manager before they can be used. Once created, ISearchCatalogManager has APIs which will instantiate and return the interfaces which will be based on the catalog specified when creating the catalog manager using the ISearchManager::GetCatalog() method.

  • GetQueryHelper - Gets an instance of ISearchQueryHelper interface for the current catalog to enable you to build queries easily.
  • GetCrawlScopeManager - Gets an instance of ISearchCatalogManager for this search catalog to enable developers to modify the crawl scope of the Windows Search Indexer.
  • GetItemsChangedSink - Gets an instance of the ISearchItemsChangedSink interface, which client applications use to notify the indexer of changes when the client wants indexing status information about the item (provider-managed notifications).
  • GetPersistentItemsChangedSink - Gets an instance of ISearchPersistentItemsChangedSink, which client applications use to notify the indexer of changes when the client does not want indexing status information (Indexer-managed notifications).

 

Managing the Catalog Contents

There are two primary tasks involved in managing the content of the catalog: reindexing all or some of the URLs in the Indexer's crawl scope and resetting the entire underlying catalog. When you reindex URLs, old data remains in the catalog until/unless it is replaced by new data. When you reset the catalog, the entire catalog is rebuilt and all URLs in the crawl scope are reindexed. This process can take a significant amount of time and should only be used as a last resort to solving issues such as a suspected corrupt index.

When you install a new application, protocol handler, or filter, the setup application should add its directory or root to the crawl scope to ensure the Indexer includes the location of that application's data. If the data doesn't appear in the catalog after the Indexer has crawled its crawl scope, then you should first ensure the location of the data is included in the crawl scope. You can add it using the Windows Search Options dialog or using the Crawl Scope Manager. If the location appears to be in the crawl scope, you can manually force a reindex of all URLs in the Indexer's crawl scope or a subset, using the following methods.

Reindexing Method Description
Reindex Reindexes all URLs in the catalog. The old information will remain until it is replaced by new information.
ReindexSearchRoot Reindexes URLs that match the pattern or start at a given root (e.g., file:///C:\Foldername\Subfoldername\). This is useful to re-crawl everything in a particular directory or with a particular extension, as when an application is installed.
ReindexMatchingURLs Reindexes exact URLs (e.g., file:///C:\Foldername\Subfoldername\) or URLs that match a given pattern (e.g., file:\\\C:/Foldername/*/)

Resetting the Index You can reset the entire index with a call to ISearchCatalogManager::Reset. This resets the underlying catalog by rebuilding the databases and performing a full index of all URLs in the crawl scope. This process can take a significant amount of time and should only be used as a last resort to solving issues such as a suspected corrupt index.

Important  Because of the slowdown in indexing that these methods can cause, they should be used carefully when you are trying to identify indexing or catalog problems. First, make sure your search roots and scope rules are added in the Crawl Scope Manager, and then make sure that the FANCI bit (File Attribute Not Content Indexed) is set properly for files and folders. If you have confirmed that these are correct, then try ReindexSearchRoot first, ReindexMatchingURL second, and Reindex last. If none of these work, then try Reset as a last resort.

 

Managing Catalog Status

The Catalog Manager can be used to get the status of the catalog for applications that want to customize how the catalog is managed (for example, a custom "Catalog Status" monitoring application) but are not typically required for most search-related development scenarios. Common uses would be for a "Catalog Status" monitoring application or a Control Panel-style application.

Method Description
URLBeingIndexed Gets the URL that is currently being indexed. This method would be useful if you were trying to identify whether the Indexer was "stuck" on an item.
NumberOfItems Gets the number of items in the catalog.
NumberOfItemsToIndex Gets the number of items to be indexed within the catalog.
plIncrementalCount Gets the number of items to be indexed in the next incremental index.
plNotificationQueue Gets the number of items in the notification queue. This method would be useful to a notifications application needing to check that the Indexer is getting the notifications it is sending.
plHighPriorityQueue Gets the number of items in the high priority queue. Items in the plHighPriorityQueue are indexed first.
GetCatalogStatus Gets the status of the catalog and returns an enumeration with the current status. The following are possible catalog states:
  • Idle: No indexing is needed
  • Paused: Indexing is paused (due to low battery or high CPU usage, for example)
  • Recovering: Indexing is recovering
  • Full crawl: Indexing is performing a full crawl of the crawl scope
  • Incremental crawl: Indexing is performing an incremental crawl
  • Processing notifications: Indexer is processing notifications
  • Shutting down: Indexer is shutting down
get_Name Gets the name of the current catalog which is specified in the ISearchManager::GetCatalog() method. Currently the only catalog supported is SystemIndex.

 

Managing Catalog Properties

There are three catalog properties that you can manage with the Catalog Manager:

  • Diacritic sensitivity Diacritics are accent marks added to letters to alter a word's meaning or pronunciation. This property determines whether the catalog is sensitive to diacritics and is important when you or your users search and index text in multiple languages. For example, with this property set to FALSE, the catalog would treat "resume" and "resumé" as if they were the same word.
  • Connection timeouts This property represents the amount of time to wait for a connection response from a server or data store, as represented in a TIMEOUT_INFO structure. You can use this to fine tune Windows Search.
  • Data timeouts This property represents the amount of time to wait for a data transaction between the indexer and a protocol handler or filter, as represented in a TIMEOUT_INFO structure. If this time has elapsed, the process from the Filter Daemon is terminated to prevent deadlock and other resource problems.

The last two are primarily intended for future use. Each of these properties have get and put methods.

Method Description
get_DiacriticSensitivity/
put_DiacriticSensitivity
TRUE if the catalog should differentiate words with diacritics. FALSE if the catalog should ignore diacritics. Changing this property requires the index to be rebuilt since the keys of the index may become invalid.
get_ConnectionTimeout/
put_ConnectionTimeout
The amount of time, in seconds, the Indexer should wait for a connection response from a server or data store. Setting this too high can cause delays if many sites are unresponsive. Setting it too low can cause some sites not to be crawled.
get_DataTimeout/
put_DataTimeout
The amount of time, in seconds, the Indexer should wait for a data transaction.