Partager via


Implementing a Protocol Handler for WDS

Creating a protocol handler involves implementing ISearchProtocol to manage UrlAccessor objects, IUrlAccessor to generate metadata about and to identify appropriate filters for items in the data store, IProtocolHandlerSite to instantiate a SearchProtocol object and identify appropriate filters, and IFilter to filter proprietary files or to enumerate and filter hierarchically stored files. The protocol handler must be multi-threaded.

This sections contains the following topics:

  • Note on URLs
  • Protocol Handler Interfaces
  • IFilters for Containers
  • Adding Protocol Handler Options Functionality
  • Related Topics

Note on URLs

Microsoft Windows Desktop Search (WDS) uses URLs to uniquely identify items in a file system, inside a database-like store, or on the Web. A URL that defines an entry node is called a start page; WDS begins at that start page and recursively crawls the data store. The typical URL structure is:

protocol://host/path/name.extension

Note  

When you want to add a new data store, you'll need to select a name to identify it that does not conflict with current ones. We recommend this naming convention: companyName.scheme.

Protocol Handler Interfaces

ISearchProtocol

The ISearchProtocol interface invokes, initializes, and manages UrlAccessor objects. For more information on implementing the ISearchProtocol interface, see ISearchProtocol Interface reference.

IUrlAccessor

For a specified URL, the IUrlAccessor interface generates metadata about the location structure as well as contained items, and it binds those items to an filter. The IUrlAccessor object is instantiated and initialized by an SearchProtocol object; however, you can also implement an internal initialization method so your IUrlAccessor object can perform initialization tasks specific to your protocol handler, such as validating the URL for an item being accessed or checking the last modified time to determine if a file must be processed in the current crawl.

Note  

Modified times for directories are ignored. The IUrlAccessor object must enumerate the child objects to determine whether there have been any modifications or deletions.

Much of the design of the UrlAccessor object is dependent on whether the structure is hierarchical or link-based. For hierarchical data stores, the UrlAccessor object must find an filter that can enumerate their contents. Another distinction between hierarchical and link-based protocol handlers is the use of the IsDirectory method. In link-based protocol handlers, this method should return S_FALSE. Hierarchical protocol handlers must return S_OK for containers.

For further instructions on implementing an IUrlAccessor interface, see the IUrlAccessor Interface reference.

IProtocolHandlerSite

This interface is used to instantiate a SearchProtocol object and also provides the UrlAccessor object with an appropriate filter for a specified class ID (CLSID). For more information, see the IProtocolHandlerSite reference.

IFilters for Containers

If you are implementing a hierarchical protocol handler, you must implement a container IFilter component that enumerates URLs representing containers or folders. The enumeration process is a loop through the GetChunk and GetValue methods of the IFilter interface that return a list of URLs that represent each item in the container.

First, GetChunk returns a FULLPROSPEC with the property set GATHER_PROPSET and either:

  • PID_GTHR_DIRLINK, the URL to the item without the last modified time, or
  • PID_GTHR_DIRLINK_WITH_TIME, the URL along with the last modified time

The property set GUID for GATHER_PROPSET is 0B63E343-9CCC-11D0-BCDB-00805FCCCE04. The PROPSPEC Property is either PID_GTHR_DIRLINK=2 or PID_GTHR_DIRLINK_WITH_TIME = 12 decimal.

Returning PID_GTHR_DIRLINK_WITH_TIME is more efficient because the indexer can immediately determine whether the item needs to be indexed without calling the ISearchProtocol->CreateUrlAccessor() and IUrlAccessor->GetLastModified() methods.

Then GetValue returns a PROPVARIANT for the URL (and last modified time if used), as either:

  • VT_LPWSTR, the URL of the child item, or
  • Vector of the URL followed by a FILETIME

The following sample code demonstrates how to build the proper PID_GTHR_DIRLINK_WITH_TIME.

Note  

THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright (c) Microsoft Corporation. All rights reserved.

// params are assumed to be valid

HRESULT GetPropVariantForUrlAndTime(PCWSTR pszUrl, const FILETIME &ftLastModified, PROPVARIANT **ppPropValue)
{
    *ppPropValue = NULL;

    // allocate the propvariant pointer
    *ppPropValue = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*ppPropValue));
    HRESULT hr = *ppPropValue ? S_OK : E_OUTOFMEMORY;

    if (SUCCEEDED(hr))
    {
        PropVariantInit(*ppPropValue);  // zero init the value

        // now allocate enough memory for 2 nested PropVariants.
        // PID_GTHR_DIRLINK_WITH_TIME is an array of 2 PROPVARIANTs
        PROPVARIANT *pVector = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*pVector) * 2);
        hr = pVector ? S_OK : E_OUTOFMEMORY;

        if (SUCCEEDED(hr))
        {
            // set the container PROPVARIANT that it is a vector of 2 PROPVARIANTS
            (*ppPropValue)->vt = VT_VARIANT | VT_VECTOR;
            (*ppPropValue)->capropvar.cElems = 2;
            (*ppPropValue)->capropvar.pElems = pVector;
            PWSTR pszUrlAlloc;
            hr = SHStrDup(pszUrl, &pszUrlAlloc);

            if (SUCCEEDED(hr))
            {
                // now fill the array of PROPVARIANTS
                // put the pointer to the URL into the vector
                (*ppPropValue)->capropvar.pElems[0].vt = VT_LPWSTR; 
                (*ppPropValue)->capropvar.pElems[0].pwszVal = pszUrlAlloc;

                 // put the FILETIME into vector
                (*ppPropValue)->capropvar.pElems[1].vt = VT_FILETIME; 
                (*ppPropValue)->capropvar.pElems[1].filetime = ftLastModified;
            }

            else
            {
                CoTaskMemFree(pVector);
            }
        }
 
        if (FAILED(hr))
        {
            CoTaskMemFree(*ppPropValue);
            *ppPropValue = NULL;
        }
    }
    return S_OK;
}

Note  

A container IFilter component should always enumerate all child URLs even if the child URLs have not changed because the Indexer detects deletions through the enumeration process. If the date output in a DIR_LINKS_WITH_TIME indicates that the data has not changed, the indexer does not update the data for that URL.

The physical URL is the URL that the UrlAccessor object processes. If the filter does not emit a user-friendly DisplayUrl, WDS displays the physical URL to the user as part of the search results. The WDS schema contains two properties to control what is displayed to the end user, as shown in the table below.

GUID PROPSPEC Description
D5CDD505-2E9C-101B-9397-08002B2CF9AE DisplayFolder Folder Path displayed to the user in search results
D5CDD505-2E9C-101B-9397-08002B2CF9AE FolderName Display name of the parent folder

If your code does not emit a DisplayFolder or FolderName, these values are computed from the DisplayUrl. Forward slashes in the URL denote containers within the store or file system.

Adding Protocol Handler Options Functionality

For your protocol handler to have a default start page (and entry node URL), you must implement the ISearchProtocolOptions interface. In future versions of WDS, this interface will provide hooks to the Options dialog for an enhanced user experience. This interface provides the following functionality:

  • Determines whether the requirements for your protocol handler are met. For example, your protocol handler's store may require access to a given application to properly index the application's data but that application is unavailable.
  • Identifies the minimum requirements your protocol handler needs to process an item. Requirements can be expressed in a user-friendly, localized description, such as "Microsoft Outlook 2000 or greater."
  • Defines the URLs your protocol handler should process by default.

ISearchProtocolOptions

The following table describes the methods you need to implement for the ISearchProtocolOptions interface.

Method Description
CheckRequirements Determines whether a custom protocol handler's minimum requirements are met
GetDefaultCrawlScope Returns a list of default URLs within a given store for a custom protocol handler
GetRequirements Identifies a user-friendly, localized description of minimum requirements for a custom protocol handler