Installing and Registering Protocol Handlers (Windows Search)
Installing a protocol handler involves copying the DLL(s) to an appropriate location in the Program Files directory, and then registering the protocol handler through the registry. The installation application can also add a search root and scope rules to define a default crawl scope for the Shell data source.
This topic is organized as follows:
- About URLs
- Implementing Protocol Handler Interfaces
- Implementing Filter Handlers for Containers
- Installing and Registering a Protocol Handler
- Ensuring that Your Items are Indexed
- Related topics
About URLs
Windows Search uses URLs to uniquely identify items in the hierarchy of your Shell data source. The URL that is the first node in the hierarchy is called the search root; Windows Search will begin indexing at the search root, requesting that the protocol handler enumerate child links for each URL.
The typical URL structure is:
<protocol>:// [{user SID}/] <localhost>/<path>/[<ItemID>]
The URL syntax is described in the following table.
Syntax | Description |
---|---|
<protocol> | Identifies which protocol handler to invoke for the URL. |
{user SID} | Identifies the user security context under which the protocol handler is called. If no user security identifier (SID) is identified, the protocol handler is called in the security context of the system service. |
<path> | Defines the hierarchy of the store, where each forward slash ('/') is a separator between folder names. |
<ItemID> | Represents a unique string that identifies the child item (for example, the file name). |
The Windows Search Indexer trims the final slash from URLs. As a result you cannot rely on the existence of a final slash to identify a directory versus an item. Your protocol handler must be able to handle this URL syntax. Ensure that the protocol name that you select to identify your Shell data source does not conflict with current ones. We recommend this naming convention: companyName.scheme
.
For more information on creating a Shell data source, see Implementing the Basic Folder Object Interfaces.
Implementing Protocol Handler Interfaces
Creating a protocol handler requires the implementation of the following three interfaces:
- ISearchProtocol to manage UrlAccessor objects.
- IUrlAccessor to expose properties and identify appropriate filters for items in the Shell data source.
- IFilter to filter proprietary files or to enumerate and filter hierarchically stored files.
Other than the three mandatory interfaces listed, the other interfaces are optional, and you are at liberty to implement whichever optional interfaces are most appropriate for the task at hand.
ISearchProtocol and ISearchProtocol2
The SearchProtocol interfaces initialize and manage your protocol handler UrlAccessor objects. The ISearchProtocol2 interface is an optional extension of ISearchProtocol, and includes an extra method to specify more information about the user and the item.
IUrlAccessor, IUrlAccessor2, IUrlAccessor3, and IUrlAccessor4
The IUrlAccessor interfaces are described in the following table.
Interface | Description |
---|---|
IUrlAccessor | For a specified URL, the IUrlAccessor interface provides access to the properties of the item that is exposed in the URL. It can also bind those properties to a protocol handler-specific filter (that is, a filter other than the one associated with the file name). |
IUrlAccessor2 (optional) | The IUrlAccessor2 interface extends IUrlAccessor with methods that get a code page for the item's properties and its display URL, and that get the type of item in the URL (document or directory). |
IUrlAccessor3 (optional) | The IUrlAccessor3 interface extends IUrlAccessor2 with a method that gets an array of user SIDs, enabling the search protocol host to impersonate these users to index the item. |
IUrlAccessor4 (optional) | The IUrlAccessor4 interface extends the functionality of the IUrlAccessor3 interface with a method that identifies whether the content of the item should be indexed. |
The UrlAccessor object is instantiated and initialized by a SearchProtocol object. The IUrlAccessor interfaces provide access to important pieces of information through the methods described in the following table.
Method | Description |
---|---|
IUrlAccessor::GetLastModified | Returns the time that the URL was last modified. If this time is more recent than the last time the indexer processed this URL, filter handlers (implementations of the IFilter interface) are called to extract the (possibly) changed data for that item. Modified times for directories are ignored. |
IUrlAccessor::IsDirectory | Identifies whether the URL represents a folder containing a child URLs. |
IUrlAccessor::BindToStream | Binds to an IStream interface that represents the data of a file in a custom data store. |
IUrlAccessor::BindToFilter | Binds to a protocol handler-specific IFilter, which can expose properties for the item. |
IUrlAccessor4::ShouldIndexItemContent | Identifies whether the content of the item should be indexed. |
IProtocolHandlerSite
The IProtocolHandlerSite interface is used to instantiate a filter handler, which is hosted in an isolated process. The appropriate filter handler is obtained for a specified persistent class identifier (CLSID), document storage class, or file name extension. The benefit of asking the host process to bind to IFilter is that the host process can manage the process of locating an appropriate filter handler, and control the security involved in calling the handler.
Implementing Filter Handlers for Containers
If you are implementing a hierarchical protocol handler, then you must implement a filter handler for a container that enumerates child URLs. A filter handler is an implementation of the IFilter interface. The enumeration process is a loop through the IFilter::GetChunk and IFilter::GetValue methods of the IFilter interface; each child URL is exposed as the value of the property.
IFilter::GetChunk returns the properties of the container. To enumerate child URLs, IFilter::GetChunk returns either of the following:
-
The URL to the item without the last modified time. IFilter::GetValue returns a PROPVARIANT containing the child URL.
PKEY_Search_UrlToIndexWithModificationTime:
The URL and the last modified time. IFilter::GetValue returns a PROPVARIANT containing a vector of the child URL and the last modified time.
Returning PKEY_Search_UrlToIndexWithModificationTime is more efficient because the indexer can immediately determine whether the item needs to be indexed without calling the ISearchProtocol::CreateAccessor and IUrlAccessor::GetLastModified methods.
The following example code demonstrates how to return the PKEY_Search_UrlToIndexWithModificationTime property.
Important
Copyright (c) Microsoft Corporation. All rights reserved.
// Parameters are assumed to be valid
HRESULT GetPropVariantForUrlAndTime
(PCWSTR pszUrl, const FILETIME &ftLastModified, PROPVARIANT **ppPropValue)
{
*ppPropValue = NULL;
// Allocate the propvariant pointer.
size_t const cbAlloc = sizeof(**ppPropValue);
*ppPropValue = (PROPVARIANT *)CoTaskMemAlloc(cbAlloc));
HRESULT hr = *ppPropValue ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
PropVariantInit(*ppPropValue); // Zero init the value
// Now allocate enough memory for 2 nested PropVariants.
// PKEY_Search_UrlToIndexWithModificationTime is an array of two PROPVARIANTs.
PROPVARIANT *pVector = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*pVector) * 2);
hr = pVector ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
// Set the container PROPVARIANT to be a vector of two PROPVARIANTS.
(*ppPropValue)->vt = VT_VARIANT | VT_VECTOR;
(*ppPropValue)->capropvar.cElems = 2;
(*ppPropValue)->capropvar.pElems = pVector;
PWSTR pszUrlAlloc;
hr = SHStrDup(pszUrl, &pszUrlAlloc);
if (SUCCEEDED(hr))
{
// Now fill the array of PROPVARIANTS.
// Put the pointer to the URL into the vector.
(*ppPropValue)->capropvar.pElems[0].vt = VT_LPWSTR;
(*ppPropValue)->capropvar.pElems[0].pwszVal = pszUrlAlloc;
// Put the FILETIME into vector.
(*ppPropValue)->capropvar.pElems[1].vt = VT_FILETIME;
(*ppPropValue)->capropvar.pElems[1].filetime = ftLastModified;
}
else
{
CoTaskMemFree(pVector);
}
}
if (FAILED(hr))
{
CoTaskMemFree(*ppPropValue);
*ppPropValue = NULL;
}
}
return S_OK;
}
Note
A container IFilter component should always enumerate all child URLs even if the child URLs have not changed, because the indexer detects deletions through the enumeration process. If the date output in a PKEY_Search_UrlToIndexWithModificationTime indicates that the data has not changed, the indexer does not update the data for that URL.
Installing and Registering a Protocol Handler
Installing protocol handlers involves copying the DLL(s) to an appropriate location in the Program Files directory, and then registering the DLL(s). Protocol handlers should implement self-registration for installation. The installation application can also add a search root, and scope rules to define a default crawl scope for the Shell data source, which is discussed in Ensuring that Your Items are Indexed at the end of this topic.
Guidelines for Registering a Protocol Handler
You should follow these guidelines when registering a protocol handler:
- The installer must use either EXE or MSI installer.
- Release notes must be provided.
- An Add/Remove Programs entry must be created for each add-in installed.
- The installer must take over all registry settings for the particular file type or store that the current add-in understands.
- If a previous add-in is being overwritten, the installer should notify the user.
- If a newer add-in has overwritten the previous add-in, there should be the ability to restore the previous add-in's functionality and make it the default add-in for that file type again.
- The installer should define a default crawl scope for the indexer by adding a search root, and scope rules using the Crawl Scope Manager (CSM).
Registering a Protocol Handler
You need to make fourteen entries in the registry to register the protocol handler component, where:
- Ver_Ind_ProgID is the version independent ProgID of the protocol handler implementation.
- Ver_Dep_ProgID is the version dependent ProgID of the protocol handler implementation.
- CLSID_1 is the CLSID of the protocol handler implementation.
To register a protocol handler:
Register the version independent ProgID with the following keys and values:
HKEY_CLASSES_ROOT <Ver_Ind_ProgID> (Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT <Ver_Ind_ProgID> CLSID (Default) = {CLSID_1}
HKEY_CLASSES_ROOT <Ver_Ind_ProgID> CurVer (Default) = <Ver_Dep_ProgID>
Register the version dependent ProgID with the following keys and values:
HKEY_CLASSES_ROOT <Ver_Dep_ProgID> (Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT <Ver_Dep_ProgID> CLSID (Default) = {CLSID_1}
Register the protocol handler's CLSID with the following keys and values:
HKEY_CLASSES_ROOT {CLSID_1} (Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT {CLSID_1} {InprocServer32} (Default) = <DLL Install Path> Threading Model = Both
HKEY_CLASSES_ROOT {CLSID_1} <ProgID> (Default) = <Ver_Dep_ProgID>
HKEY_CLASSES_ROOT {CLSID_1} <ShellFolder> Attributes = dword:a0180000
HKEY_CLASSES_ROOT {CLSID_1} TypeLib (Default) = {LIBID of PH Component}
HKEY_CLASSES_ROOT {CLSID_1} VersionIndependentProgID (Default) = <Ver_Ind_ProgID>
Register the protocol handler with Windows Search. In the following example, <Protocol Name> is the name of the protocol itself, such as file, mapi, and so forth:
HKEY_LOCAL_MACHINE SOFTWARE Microsoft Windows Search ProtocolHandlers <Protocol Name> = <Ver_Dep_ProgID>
HKEY_CURRENT_USER SOFTWARE Microsoft Windows Search ProtocolHandlers <Protocol Name> = <Ver_Dep_ProgID>
Prior to Windows Vista:
HKEY_CURRENT_USER SOFTWARE Microsoft Windows Desktop Search DS Index ProtocolHandlers <Protocol Name> HasRequirements = dword:00000000 HasStartPage = dword:00000000
Registering the Protocol Handler's File Type Handler
You need to make two entries in the registry to register the protocol handler's file type handler (which is also known as a Shell extension).
-
HKEY_LOCAL_MACHINE SOFTWARE Microsoft Windows CurrentVersion Explorer Desktop NameSpace {CLSID of PH Implementation} (Default) = <Shell Implementation Description>
-
HKEY_LOCAL_MACHINE SOFTWARE Microsoft Windows CurrentVersion Explorer Shell Extensions Approved {CLSID of PH Implementation} = <Shell Implementation Description>
Ensuring that Your Items are Indexed
After you have implemented your protocol handler, you must specify which Shell items your protocol handler is to index. You can use the Catalog Manager to initiate re-indexing (for more information, see Using the Catalog Manager). Or you can also use the Crawl Scope Manager (CSM) to set up default rules indicating the URLs that you want the indexer to crawl (for more information, see Using the Crawl Scope Manager and Managing Scope Rules). You can also add a search root (for more information, see Managing Search Roots). Another option available to you is to follow the procedure in the ReIndex sample in Windows Search Code Samples.
The ISearchCrawlScopeManager interface provides methods that notify the search engine of containers to crawl and/or watch, and items under those containers to include or exclude when crawling or watching. In Windows 7 and later, ISearchCrawlScopeManager2 extends ISearchCrawlScopeManager with the ISearchCrawlScopeManager2::GetVersion method that gets the version, which informs clients whether the state of the CSM has changed.
Related topics
-
Conceptual