Del via


Insights into MS IFilter Testing Strategy.

Ever since I started dealing with filters, I've seen numerous questions regarding "What does the proper validation of an IFilter mean? What tests should we execute and how to excute them?" . Hence, its only appropriate that we publish a document detailing our rigorous test procedure so that everyone targeting components at MS Search products can benefit from it. 

 Disclaimer: The following list presents only a subset of the testing methodologies we apply at MS Search and are by no means meant to be a quick recipe for weeding out ALL security vulnerabilities in your filter.The list is meant to provide an overview of the  issues one should think about while testing and implementing filters.

----------------------------------------------------------------------------------------------------------------

A. Architectural Considerations : - COMPLIANCE REQUIRED

 

Filter does not require client installation.
Filter is free from dependencies during runtime.
Filter is free from dependencies during compiletime.
Filter dll is monolithic.

1. The Filter DLL does not require the client to be installed on the indexing machine.
2. The Filter dll does not make references to other binaries during compile time.
3. The filter dll is monolithic, self- contained without any other external dependencies.
For an overview of the problems caused by non-monolithic DLLs, please see:
http://blogs.msdn.com/ifilter/archive/2006/11/20/breaking-the-monolithic-filter-dll.aspx

B.Threading Model: - COMPLIANCE REQUIRED

 

Filter supports "BOTH" threading model.
Filter supports "Free" threading model.

Filter threading model must be marked as either "BOTH" or "Free" under:
HKEY_CLASSES_ROOT\CLSID\{GUID}\InprocServer32

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{GUID}\InprocServer32

We recommend using "BOTH" threading model.An object that is marked with a threading model of "Both" takes on the threading model of the thread that created the object. Marking the threading model as "Both" necessitates that the filter is threadsafe.

C.OS Versions Supported:

Filter Works on WinXP
Filter Works on Win2K3
Filter Works on 32 bit Vista
Filter Works on 64 bit Vista
Filter Works on 32 bit Longhorn Server
Filter Works on 64 bit Longhorn Server

The Filter should support the follwing OS versions:
-> WinXP & Win2K3 : Filtering of <document format> should be checked with WDS 3.0.
--> For Vista and Longhorn, use the built in search facility.

D. Backwards compatiability with SPS2003 :

Filter works with SPS 2003
1. Register filter dll with SPS 2003.
2. Create a content source with your documents, crawl and query.

E. Loading Mechanisms : - COMPLIANCE REQUIRED

 

Filter Supports IPersistStream
Filter Supports IPersistStorage
Filter Supports IPersistFile

The filter needs to support all three loading mechanisms for backward and forward compatiability reasons. We recommend trying to load via IPersistStream and fall back to IPersistStorage or IPersistFile only if IPersistStream is not supported.

The IFilterExplorer can be used to check which loading mechanisms are supported:
http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx

F. Dedicated support for 64 bit platforms :

Dependency walker satisfied for 64 bit filter dll.
For 64 bit platforms, there should be no dependency on 32 bit binaries, i.e., no WOWing applications.
Run <Depends.exe> to check if dependencies are satisfied to prevent runtime errors.

Known Issue: A dependency on MSJAVA.dll shows up in red in dependency walker. You can safely ignore this.

G. Code Coverage:

We recommend at least 70% code coverage. This can be easily profiled using VS 2005 Team System.

H. IFiltTst - Consistency, Legitimacy and Illegitimacy tests:

 

Consistency test with pass rate > 95%
Legitimacy test with pass rate > 99%
Illegitimacy test with pass rate > 90%

IFiltst can be used to run the following test:
Consistency Test: The chunks emitted by the filter should be consistent between two runs.
Legitimacy Test: This test validates that the filter is initialized with proper config and getText() and getValue() are functioning as expected.
Illegitimacy Test: In essence, this test tries to validate that the filter is well behaved by trying to exercise inappropriate configs during initialization and also by calling getText() on value type chunks and vice versa.

Details of using IFilttst can be found here: http://msdn2.microsoft.com/en-us/library/ms692580.aspx

I. Security tests with Fuzzing :

Fuzz tested 0.5 million of each document format.
All bugs surfaced due to fuzz tests are fixed.

1. Fuzz a minimum of 0.5 million of each document format handled by the filter and feed them to FilterTest.
2. Have PageHeap enabled throughout the Fuzz test run.
3. Analyze any heap corruption, stack overflow, buffer overrun, crashes etc and resolve/fix the bugs.

Pageheap can be enabled with Appverifier. Download here:
http://www.microsoft.com/downloads/details.aspx?familyid=bd02c19c-1250-433c-8c1b-2619bd93b3a2&displaylang=en

NOTE: The fuzzer is an internal tool. A list of external fuzzers is provided here: http://www.infosecinstitute.com/blog/2005/12/fuzzers-ultimate-list.html

Again, use these at your own risk:)

J. Performance Scaling:

 

80% scaling achieved with 2 Processors
80% scaling achieved with 3 Processors
80% scaling achieved with 4 Processors

Optimum usage of processors in a server environment is crucial for performance. The goal is to achieve 80% performace scaling with the addition of each new processor. Here's the test outline.
1. On a Quad proc machine, use ifilttst.exe with one thread to filter a large corpus of document and note down the time taken.
-> Now use ifilttst.exe with two threads to filter the same corpus. The time taken should be (0.556 * TIME FOR FILTERING WITH ONE THREAD)
-> With the addition of each subsequent thread, the new time T2 can be found with the formula:
T2 = T1 * 1/[(1.8)^ (log2 N)] where N is the number of threads.

K. AppVerifier Tests :

Basic Test Passed -> Logs provided
Low Resource Simulation Passed -> Logs Provided
Miscellaneous tests passed -> Logs provided.

The Appverifier tests seek to weed out critical security and performance defects. The tests should be conducted in 3 layers,each layer of test executed in a seperate test run.The layers are described below.
1. BASIC:
-> Exceptions - Ensures that the application does not hide AVs using structured exception handling.
--> Handles - Ensures that the application does not attempt to use invalid handles.
--> Heaps - Checks for memory corruption issues in the heap.
SETTINGS: Full Page Heap
                 Dll : <IFilter Dll>                
--> Locks - Verifies correct usage of critical sections and identifies potential deadlocks (timeout 7 minutes).
--> Memory - Ensure calls to APIs for virtual space manipulations are used correctly.
-->Threadpool - Checks for dirty threadpool thread and other threadpool related issues.
-->TLS - Ensures that Thread LOcal Storage APIs are used correctly.

The expectation for this scenario is that the application does not break into the debugger. This means that you have no errors that need to be addressed.

2. LOW RESOURCE SIMULATION: Accept the default settings. Filter a corpus(large collection of documents) containing 10000+ files. Use IFiltTst to loop through the corpus filtering the files. As long as we can get through the corpus without breaking into the debugger, it should be fine.

3. MISCELLANEOUS: Here check the
--> Dangerous APIs: checks for proper usage of API calls such as "TerminateThread"
--> Dirty Stack - detect uninitialized variables in future function calls in that thread's context.
Accept the DEFAULT Settings here as well.

HOW TO RUN THE TESTS:
1. Start Appverifier.
2. Add your application (IFiltTst) to Appverifier.
3. Check off the test mentioned above. You need to run the test three times
   for each layer.
4. Save your application.
5. Set the PROPAGATE property to true -> this ensures appverifier settings are
   applied to any threads spawned by IFiltTst.
6. Run IFiltTst from the command line on a corpus containing 10000+ files.
7. Save the Logs from the three runs.

Detailed information about using Appverifier can be found here:
http://msdn2.microsoft.com/en-us/library/aa480483.aspx

L. Globalization:

 

Arabic
Chinese
Czech
English
French
German
Hindi
Japanese
Polish
Spanish
Thai

If the document format facilitates marking the language / locale of contents (eg.MS Word), filtering of the documents marked with above languge tags must be verified. This is important as the the filter emits a locale information based on the language of the document, which is used by MSSearch to invoke the correct WordBreaker and Stemmer for the document.

M. Registry and File I/O:

 

No unnecessary File I/O
No temp files created
No independent registry I/O by filter.

1.Use Filemon.exe with the filemon filter set to the name of your dll and verify that no file system I/O was initiated by IFilter other than the documents it is indexing. Take special note if the filter is creating temp files.
2. Use Regmon.exe to verify that no registry read/write operations are performed.

www.sysinternals.com has both 32 and 64 bit versions of Filemon and Regmon.

N. Prefix/Prefast for Vista :

No Prefix/Prefast errors during compilation.

In Office team, the OACR checks for this if we build with windows Prefast requirements.However in other environments, we need to use the Visual Studio build configuration manager to enable Prefast error checking.

More info( MS Employees):
PREFIX internal website
PREFAST: wrapped in OACR

WWW Resources:

http://msdn2.microsoft.com/en-us/library/ms933794.aspx 

O. Calls to undocumented windows API :

No call to undocumented windows APIs
Run APIScan to ensure we do not make any calls to undocumented windows API's.

Note: This requirement is solely for MS and MS partners to avoid situations like Secret API fiasco.

P. SAL annotation :

No SAL warnings - Logs provided
SAL annotation is an excellent way to weed out potential security flaws in the code. More info at:
http://msdn2.microsoft.com/en-us/library/ms235402(VS.80).aspx

Q. UI Popups :

No UI Pop-ups in filter.
Use Filtdump to filter the document and ensure there are No UI Popups.

R. International Sufficiency:

We've seen a lot of issues in the past where Unicode / DBCS characters were not handled correctly by IFilters and Protocol Handlers. The problem is a bit more serious in Protocol Handlers as the address of the content source might be encrypted in a DBCS charset and the data retrieval fails.

Use multiple special Unicode characters in the file contents and test for their output. The following figure provides a sample of Unicode characters to test.:

 S. Security Code Review:

This is the final line of defense against introducing security bugs in your code. DO NOT be skimpy on this!!! :)

Comments

  • Anonymous
    April 09, 2008
    Hi guys. Nice post, but where can I get a hold of ifilttst.exe? It doesn't seem to live in the Windows SDK any more. I can get FiltDump.exe from a copy of the Windows Search SDK (which I think has been superceded by the Windows SDK). Any chance of a download of the latest versions of these tools? (IFiltTst, FiltDump, and I think it's FiltReg?) Cheers Matt

  • Anonymous
    May 29, 2008
    Hello Deb, will there ever be 64-bit versions of the lrtest and ifilttst tools? As 64-bit processors become more and more common, I would expect Microsoft to release the Windows Server resource kit also for 64-bit... It's hard to test and debug IFilters on 64-bit without lrtest and ifilttst. Best Regards Stephan

  • Anonymous
    August 03, 2010
    Hi Deb, I tried emailing you via this blog's contact link, but I guess you don't check it that often :-P Do you have any examples of how to get an IFilter to return a multi-value/multivalue from IFilter::GetValue? I tried wrapping the COM values in a SAFEARRAY, but Vista's indexing service doesn't recognize it at all.  I'm trying to test on Sharepoint 2010, but still struggling w/ the install for that so haven't been able to yet :-P I have put in enough instrumentation to determine that indexing service only calls ::GetValue once instead of calling it multiple times until it finds no more values, so the only other thing it can return is a SAFEARRAY. Also, are there limitations on multivalue data types?  I.e., can it be a multivalue of ints, dates, etc. instead of only strings?  I've found references that multivalues can be strings, but nothing else...