Recent IFilter implementation and deployment questions.
Please post your questions and comments about using, implemnting and deploying IFilters to work with Microsoft Search Products here. If deemed necessary, the discussion topics of broader interest will be sorted into seperate threads.
Comments
Anonymous
November 29, 2006
The Search Daemon which loads the filter is 64 bit in 64 bit Sharepoint.In general, a 64 bit process cannot load a 32 bit COM dll. The same is applicable for 64 bit WDS(Windows Desktop Search). On a similar note, if you're writing custom IFilters as a plugin for Sharepoint/WDS, please compile them into 64 bit DLLs so that they can be consumed from withing 64 bit Search Daemons.Anonymous
November 30, 2006
Hey, instead of downloading all separate iFilters from different websites is there a pack that installs the most popular iFilters automagically?
- Angus
We have a plan for a filter pack that will include an wide array of MS/third party filters like Visio,OneNote etc.The release ETA is sometime around June/July 2007.Any filter included in the Filter Pack has to pass through intense security and fuzz testing. If you're writing a propeitary filter and want it included in our filter pack, please let me know and I can provide the details. Deb.
- Anonymous
November 30, 2006
Is there a tool that will allow me to easilt manage the filters that are in use. I'm running vista and trying to work out wht PDF files are not being indexed. I assume some bug in the Adobe iFilter and have tried to use a different iFilter but it is tricky to work out which iFilters are being used. ----------------------------------------------------------------------------------------------------------- There can be two reasons why the PDF files are not indexed.
- The PDF Filter is corrupt. In this case, use the ifiltst.exe utility to filter the PDF files and look at the dump.The utility is located here: http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-b18c4790cffd&DisplayLang=en The documentation can be found on msdn: http://msdn2.microsoft.com/en-gb/library/ms692580.aspx
- If the filter is functioning properly, then the registry entries might be incorrect.You can check if the right registry keys are accessed with regmon and if the search service actually loads the PDF filter dll with filemon. By the way, which version of the PDF filter are you using ? -Deb
Anonymous
December 15, 2006
> In general, a 64 bit process cannot load a > 32 bit COM dll. Is this also true for Vista 64bit? I've installed Acrobat Reader 8.0 which comes with an IFilter but Vista's indexing service claims that the 'Registered IFilter is not found' for PDF files. I am wondering if this is a 64/32bit compatibility issue.Anonymous
December 16, 2006
Yes , the v.8 reader comes with a 32 bit IFilter which cannot be loaded from 64 bit search service in 64 bit vista.Anonymous
December 18, 2006
Thank you very much for the confirmation Deb.Anonymous
December 24, 2006
Currently I am planning a new version of our 2005 version DWG iFilter. This is to support the newer 2007 DWG file format, and address questions on it's operation with SQL 2005 and MOSS (sharepoint) 2007. Would you be interested in information on this product, please visit http://www.cadcompany.nl/ifilter As I am unexperienced with iFilter development, I have many questions to find answers for. In my preparations, Deb Haldar has provided me with crucial information to help me get on the right track. I would like to share this information in his blog to help make this the "One stop shop" for IFilter related issues. Probably a seperate thread will be created to track the development cycle of an iFilter from scratch. The existing 2005 version iFilter project is coded mostly in C++, in VC 6.0. Some questions I'd like to discuss:
- Should we use C++ or transfer to dot Net and why?
- What is required for coding a proper iFilter
- Testing for Multithreading compatiability
- Registration with Sharepoint and SQL 2005 I would like to start sharing my information soon, and I am interested in your comments. Marco van Schagen
Anonymous
December 30, 2006
I've some documents created on a solaris platform with Star Office. I want to index them from sharepoint 2003 and was wondering is Microsoft / Sun has an IFilter for Star Office?Anonymous
December 31, 2006
Currently neither Microsoft nor Sun Microsystem have an IFilter for Star Open Office documents. However, you might be able to find some third party vendors making this Ifilter.I came across one here: http://www.ifiltershop.com/staroffice-openoffice-ifilter.html Let me know if this serves your purpose.Anonymous
January 13, 2007
Thanks for the guidance regarding the lack of an available 64-bit PDF IFilter from Adobe. Very helpful. Another workaround for MOSS web farm deployments that has not been mentioned is to deploy a 32-bit index server while keeping the rest of the SharePoint farm 64-bit. It gives you the performance benefits at the database and web front-end tiers, the two most likely points for bottlenecks, until it is ok to upgrade the index server to 64-bit.Anonymous
February 07, 2007
Deb et al., Do you know if there will be a 64-bit ifilter for Adobe PDFs for the 64-bit Vista editions?Anonymous
February 07, 2007
Brian, So far Adobe has not made an announcements as to whether they will be releasing a 64 bit version of PDF filter.The issue is discussed at length under the thread: http://blogs.msdn.com/ifilter/archive/2007/01/08/microsoft-s-strategy-for-dealing-with-32-bit-search-binaries-within-64-bit-servers.aspx Deb.Anonymous
February 23, 2007
Hello all, Is there a place where I can get just the Office 2007 iFilter, to install it on an MS Index Server deployment? I'd rather not install all of Office to get just the iFilter. Thanks in advance, DaveAnonymous
February 23, 2007
Dave, presently there is no such facility like that to the best of my knowledge.However, we're planning a seperate downloadable package of MS filters sometime in the middle of 2007. Currently, if you want to index Office 2007 documents, you'd need to install MOSS 2007 or office client 2007.Anonymous
February 27, 2007
Thanks, Deb. Not the answer I was hoping for, but I'm looking forward to the package of iFilters, when it arrives. DaveAnonymous
March 16, 2007
How do you deploy an x86 version of Index Service on a 64-bit platform running WSS 3.0? I would like to provide backward compatibility for iFilter for pdf documents.Anonymous
March 19, 2007
Ryan, I don't think one can deploy a 32 bit version of indexing service on a 64 bit windows machine.Anonymous
March 21, 2007
::Is there a tool that will allow me to easilt manage the filters that are in use. Yes, see IFilter explorer at citeknet.comAnonymous
April 01, 2007
I have to encrypt office files when I upload them to MOSS. So, I can't search my encrypted office files! I want to decrypt them. For text files, I made a decrypt filter and replaced with text filter. It works well. But, about office files, I can decrypt them but I can't extract text from them - because my filter doesn't know about office format! I can't find the way to use original offfilt.dll in my new filter.Anonymous
April 02, 2007
The comment has been removedAnonymous
April 07, 2007
Thank you Deb Haldar. I made a wrapper filter and it works well with desktop search. To do that, I replaced registry because the encrypted files are still have the same extensions. And I also changed some registries for MOSS. But, about MOSS, still I can't invoke my new filter. MOSS still invokes the old dll (for offfiltx.dll) or just tell me some error that filtering failed and I might install a new filter. I took a long~ time with this, and I'm in a urgent. Please help me. WHAT I SHOULD CHANGE IN REGSTRY TO INVOKE MY NEW FILTER? - IN DETAIL.Anonymous
April 07, 2007
Lee, the most likely cause of why the new filter is not invoked is because the CLSID of actual offfiltx is still prsent in the registry. Here's what I'd try doing:
- Note down the CLSID of your IFilter wrapper.
- change the following reg key with the CLSID of your new filter: [HKEY_LOCAL_MACHINESOFTWAREMicrosoftOffice Server12.0SearchSetupContentIndexCommonFiltersExtension.ext] Default REG_MULTI_SZ = IFIlter CLASSID
- If you still get an error in filtering, try to determine using regmon and filemon what filter binaries were accessed(your wrapper/ original offfiltx.dll) and what sequence of regitry calls were made to pick up the filter. As a matter of curiosity, which company are you making this customization for? It might be intresting to think about providing an OOB way to do this:)
Anonymous
April 07, 2007
The comment has been removedAnonymous
April 07, 2007
The comment has been removedAnonymous
April 08, 2007
Thank you very much Deb Haldar. Now it looks invoke my dll. Then, I met an assert. It's from atlcomcli.h _NoAddRefReleaseOnCComPtr<T>* operator->() const throw() { ATLASSERT(p!=NULL); return (_NoAddRefReleaseOnCComPtr<T>*)p; } When I search through desktop, it doesn't make an assert. I am so sorry bordering you. But, I took too long time to fix this, and I have no improvement. I'm new about Microsoft programs, and it's very tough to me. If you can, please help me a little more. And, I have an another question. Is it possible to create a 64bit ifilter with 32bit machine?Anonymous
April 08, 2007
Lee, AFAIK this usually happens when you try to uninitialize a COM interface which still has a non zero reference count.Please check if your code actually decrements the ref count to zero before you call CoUninitialize()or CoUninitializeEx(). Also using the ref count debugging preprocessor directive <_ATL_DEBUG_INTERFACES> added to stdafx.h before atlbase.h should help in isolating the issue. Yes, you can compile a 64 bit version of ifilter on 32 bit machine, by modifying the compiler flags in VS to 64 bit.Anonymous
April 08, 2007
Thank you again. I found that it happens because my filter fails when loadifilter to invoke offfilt.dll internal. The return value of loadifilter is just 'E_FAIL'. Can't I use loadifilter in my filter? And I also can not use fopen fprint and so on. Is there anything needed to use file in ATL? I am coding with C++, and I'm afraid to code in C style because I'm very familia with C and unix environment. Thank you for your time and concern.Anonymous
April 08, 2007
Lee, it's difficult to debug without seeing the code. However, you can try the following option:
- Use LoadLibrary to get a handle to the module.
- Use GetProcAddress( hMod, "DllGetClassObject" ) to retrieve the function ptr.
- Declare a classfactory and use the function pointer in step 2 to retrieve the classfactory interface pointer.
- Call CreateInstance on the classfactory to instantiate the filter. AFAIK, indexing service uses the LoadIFilter, but I'm not sure about other search products. Let me know if this works for you. May I also suggest that you contact Microsoft product support for MOSS 2007. These folks are really good in trouble shooting problems like this and total confidentiality about implementation is guaranteed.
Anonymous
April 09, 2007
Thank you for all your help, Dab. Finally I fixed it. It was because my decrypter couldn't get the key address. And it made thing messy. (In fact, I'm still wondering why it can't process the key path like 'c:\dir\keyfile'. The decrypter gets the path in that way and it worked well with desktop search. So I moved the key to windows/system32 folder.) It was very helpful to use the debugger in the way descripted on the other document in this blog. Anyway, now I have only 64bit problem. Thank you very much for your help.Anonymous
April 09, 2007
You're welcome Lee! Glad to know you found the info on the blog helpful :)Anonymous
April 12, 2007
I need some information on iFilters . Are they tightly bound to the WDS or its a general thing which i can use without using any Windows Search products as an Intermediate One . In general .... Is it possible to develop my own Search Engine and use the iFilters that are already available in my search engine and totally isolate Windows Search products ?Anonymous
April 12, 2007
Ifilters are not tightly coupled to WDS or any MS Search products in general. As far as developing your search engine is concerned using IFilters, the answer is yes.But please keep in mind that IFilters only forms a small part of the whole indexing pipeline in general. However, I've seen third party vendors using IFilters to write custom applications to index documents from within their applications.Anonymous
April 12, 2007
Thanks Deb for ur comments ... I have another question .. Do we need to have a legal copy of MS-Office to use the Office iFilters ??? And also .... can i have some links which describes the various methods the Office iFilter offers to its clients in processing the office documents ?? thanks, CharanAnonymous
April 12, 2007
Charan, as of now the only way to get the Office 2007 IFilter is by buying either MOSS 2007 or Office 2007. Thus, the answer is yes, you'd need a copy of either of the two above mentioned products. However we do have a plan to release the office filter as a seperate (and free!!!) downloadable package sometime middle of this year. The methods exposed by offiltx.dll are the ones described in the IFilter Interface documentation on msdn. This should be sufficient for extracting data and building an index. regards, Deb.Anonymous
April 13, 2007
Thanks Deb for the update . Deb, i went through the iFilters methods . I dont see a method which gives a Preview of ( say ) a Page in a document . Is such a method available ? What i precisely want to do is the following ...
- Extract Text from all the pages of a Doc ( Eg :MSWord )
- search for a string in the text retrieved .
- If i found a Hit in Page "i" i Need to highlight that string in that page (i) and give a Preview of that page with highlighted string . Is there any methods available to do such things ??? Thanks a lot for your time , Charan
Anonymous
April 13, 2007
Charan, you can achieve step 1 with the office IFilter. You'd need your own indexer to do step 2. Step 3 is more complicated and Ifilters cannot do this for you.You'd need to write your own plugin for that.But this is an extremely intereting concept - we'd definitely like to hear more about it:)Anonymous
April 13, 2007
The comment has been removedAnonymous
April 14, 2007
I wish there was some provision for displaying a preview of a Page with the highlighted text :-) ( microsoft should consider this and add some interfaces to the iFilters to support previewing :-) ) any way, thanks for your time Deb .Anonymous
May 25, 2007
Is there a 64bit version IFilter for Visio available? Thanks, -HeathAnonymous
May 26, 2007
Heath, we're in the process of releasing a 64 bit version in the Filter pack. cheers, Deb.Anonymous
July 13, 2007
How can one index .tif and .mdi files on Vista? The Office 12 MODI IFilters don't work on Vista.Anonymous
July 16, 2007
Steve, unfortunately there's no way to index TIFF files on Vista :(Anonymous
July 18, 2007
The comment has been removedAnonymous
July 18, 2007
Hi Sharad, rtf formats are indexable OOB. Please navigate to: Shared Services Administration: SharedServices1 > Search Settings > File Types ->New File type and add <rtf> as a new file type to be indexed by MOSS. cheers, Deb.Anonymous
July 18, 2007
That's interesting Deb. In our case, we are being told that we have to buy third-party for this basic format! I'll confirm and get back. Thanks. -- SharadAnonymous
July 24, 2007
I had heard a rumor that MS was going to release an update for MOSS 2007 that includes a TIF IFilter. Is this true and if so how and when can I get it? Thanks, AdamAnonymous
July 25, 2007
Hi Deb, I am trying to index RTF documents in MOSS 2007. I added <rtf> file type as you told and then ran indexing service successfully, but it is not searching RTF docs. I created RTF doc using MS Word, wrote some english and non-english (numbers, special characters) and saved as "*.rtf". Any help would be much appreciated. Thanks. Ashish GuptaAnonymous
July 31, 2007
Hi deb, I am trying to search the Pdf files in the MOSS search , but i was not able to get those files in the Result page. What should be done in order to get those pdf files listed in the Search Results page. Any help would be much appreciated. Thanks. BalaAnonymous
July 31, 2007
Hello Bala, the easiest way would be to add the pdf extension to list of crawled file types and then install the foxit pdf filter. Please follow the steps listed here: http://blogs.msdn.com/ifilter/archive/2007/05/10/long-awaited-64-bit-pdf-ifilter-finally-available.aspx Note that you do not need to change the CLSID anymore as the latest foxit installer takes care of it. Thanks, Deb.Anonymous
July 31, 2007
Hi Deb, I am trying to index RTF documents in MOSS 2007. I added <rtf> file type as you told and then ran indexing service successfully, but it is not searching RTF docs. I created RTF doc using MS Word, wrote some english and non-english (numbers, special characters) and saved as "*.rtf". Any help would be much appreciated. Thanks. Ashish Gupta ========================================= Ashish, did you recycle the search service after you added the rtf extension? -Deb.Anonymous
July 31, 2007
Well, you know what they say adam - "Don't believe in rumours ! " :) On a more serious note, we're planning to release the TIFF filter with filter pack - the release date of which is still TBD. cheers, Deb.Anonymous
August 09, 2007
Dep, you know that the unavailability of a TIF and MDI filter just for the "Microsoft Document Imaging" part of Office keeps us not upgrading to MOSS 2007. How can MS discontinue the capability to index the own Office stuff? We need this and I can not understand decissions to pospone this functionality so far out. RolfAnonymous
August 25, 2007
I saw some comments here about a release of iFilters for various Microsoft formats somewhere in the middle of 2007... Is there any progress on this ?Anonymous
August 25, 2007
I forgot to ask: will that release contain iFilter for the XML format of Word 2003 (WordML)?Anonymous
August 27, 2007
Yes the tentative timeline for the release is towards end of this year. I don't think it'll have a filter for WordML. But I'll double check.Anonymous
August 30, 2007
We actually trusted Microsoft when they announced that they would largely support XML with their release of Office 2003. So we build a solution based on the WordML format... and we are unable to index them... The workaround would be to install Office 2003 on the servers where the indexing in SQL occurs... Which is, in my opinion, a non-sense. So I'm very worried about the content of the iFilter package. It looks like Microsoft is not that much supporting previous versions anymore... In the past, we never had to worry about backward compatibility... Now I start to fear even for the previous release... while Office 2007 had been released only a few months ago... I looked around on the net, and I think we are not the only one to ask for this... It looks like Microsoft wants to forget about its previous release quickly and give all support on the Open XML format. I realy do hope that the kit will contain the ifilters for at least the previous version of Office... Otherwise it would mean that the only unsupported version of XML would be 2003, since previous binary formats where already supported... Please consider our demand... ThanksAnonymous
September 17, 2007
Deb, will the IFilter for TIF when it comes out be able to OCR TIF images as well as you could do in Sharepoint 2003? Are there any 3 party TIF iFilters that you know of? Is the release of the Filter pack still the end of the year? ThanksAnonymous
November 26, 2007
Do you know of any Corel WordPerfect iFilters for 64-Bit MOSS 2007 deployment?Anonymous
December 05, 2007
Any update on when the iFilter pack will be released?Anonymous
December 11, 2007
Hi Deb! I've got the same problems when indexing .rtf files as Ashish has. I added the rtf-file-type, restarted both search services and resetted the iis (sometimes a miraculous task ;)) Unfortunately it didn't change anything. rtf-documents are still not indexed. Do you have any further ideas about that? Is a solution for this problem included in SP! for WSS and/or MOSS, which MS has released right now? Thanks in advance! Regards FrankAnonymous
December 18, 2007
Office 2007 iFilters are now availableAnonymous
December 20, 2007
When will the Tiff IFilter be available? Is a 64-bit tiff ifilter available now?Anonymous
January 07, 2008
I hope I'm not missing something obvious, but how do I make the MOSS search look in plain text files which are not in .txt extension (or any other extension that's indexed by default)? I added the extension I need (.cs) to the list of extensions searched, reset the IIS and indexed the files, but the search only looks in the file names, no search in the actual file content is done. Am I missing something?Anonymous
January 17, 2008
Hi Deb, I corrected registry key according to your blog. The citeknet also dispays the registing is good. Crawl log also says the VSD is crawled. But why it can't still crawl body of Visio. Following is the search result summary. Thanks a lot! "C:Program FilesMicrosoft OfficeVisio112052"BASFLO_M.VST chengxul Microsoft Visio ASB TC010497202052 进程 : 将此形状拖到绘图页上。 : 11 : 1FBE7366-0000-0000-8E40-00608CF305B2 预先定义的进程 : 拖到绘图页后,可以添加一个特定进程,如子例程或模块。Anonymous
January 18, 2008
The comment has been removedAnonymous
January 24, 2008
I have installed Adobe ifilter 6.0 on ShapePoint Services 3.0 with Acrobat Reader 8.1.1. I made all the necessary changes on registry and still could not search pdf files. Any suggestions?Anonymous
February 07, 2008
WDS seems to be not able to index the contents of my visio VSD files. The properties appear to get indexed but not the contents. In advanced options I specified that contents of VSD be indexed. Any suggestions?Anonymous
February 07, 2008
Jim, can you send me the file (or a similar file) that you're trying to index ? Thanks, Deb.Anonymous
February 08, 2008
Hey Deb, I think this might not be a file issue. I initially had trouble getting this working but after reinstalls of the filter pack, WDS and reboots (in various orders) it worked. Because of these initial problems I wanted to make sure I had a repeatable process that I could share with my co-workers ... so I un-installed WDS and the filter pack and have since been unable to get it to work. I think this is more an issue with the filters not being properly registered with WDS. However, I will happily send you an example file, just not sure how to do that. My email is james.nowak@ge.com. Thanks for your help. -JimAnonymous
February 09, 2008
Jim, did you take a look at the KB errata published on this blog? Ideally you'd install WDS and then install filter pack. Also, you can send me the problematic file at debh@microsoft.com - My team does not make the visio filter, we're just a shipping vehicle for this filter. Nevertheless, I'd like to have a look if the issue is affecting Microsoft customers. Thanks, Deb.Anonymous
February 10, 2008
Hello, is there an ifilter for .sql (sql query files)? Or could it be possible to use the .txt Filter for this type?Anonymous
February 25, 2008
AFAIK, there is no ifilter for .sql (sql query files) - it might be possible to use tquery.dll to filter these files. You can try following the registration steps for 3rd party filters mentioned in this blog and register the text ifilter to handle .sql files. But I'm kind of skeptical about the results you'll get back. One way to testdrive the validity of results is to rename the .sql files to .txt and try searching through them. If this gives you relevant results, registering tquery.dll for handling .sql should work.Anonymous
February 27, 2008
I face a problem here, i had 2 machine, which is Window Server 2003 Standard SP2 (OEM) and Window Server 2003 Standard SP1. Both of them i also install TIF IFilter but only the Window Server 2003 Standard SP1 are working. I also had try with 2 machine with the same Window OS. Window Server 2003 Enterprise Edition SR2, both machine i also had installed with TIF IFilter, but only one are working and other one was fail. All testing was doing on the clean machine with the same step. Why only had either one is working and other was not?Anonymous
February 27, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 28, 2008
The comment has been removedAnonymous
February 29, 2008
See http://henricodolfing.blogspot.com/2008/02/tiff-ifilter.htmlAnonymous
March 06, 2008
The comment has been removedAnonymous
March 12, 2008
How do I get the iFilter for Office 2007 working with SQL Server 2005. After I install the filter pack, i checked the sys.fulltext_document_types table and did get the additional extensions recorded(docx, potx..) However, the fulltext does not seem to index the docx, xlsx,etc. Any advice...Anonymous
March 18, 2008
How do I get the iFilter for Office 2007 working with SQL Server 2005. After I install the filter pack, i checked the sys.fulltext_document_types table and did get the additional extensions recorded(docx, potx..) However, the fulltext does not seem to index the docx, xlsx,etc. Any advice...Anonymous
March 19, 2008
Hello, I see where to download the iFilter pack at http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en. However, there seems to be a registration process that limits the use of the filters to the following: Office SharePoint Server 2007 Search Server 2008 SharePoint Portal Server 2003 Windows SharePoint Services v3.0 Exchange Server 2007 SQL Server 2005 SQL Server 2008 I have server 2003 enterprise with the default indexing server. Will this filter pack allow me to find office 2007 files? Thanks, EricAnonymous
April 06, 2008
Same issue as Eric Server 2003 iis 6 No office / No sharepoint installed. I've installed the Filter pack but am still unable to index docx files using the default indexing server. Any suggestions on how to get this working in the environment specified ? DMAnonymous
April 07, 2008
Deb and Eric, I've found the soluton - thanks to some of the kind folks who provide Premier support. A registry change is required I've documented this here http://dorjem.blogspot.com/2008/04/office-2007-files-indexed-by-indexing.html DorjeMAnonymous
May 22, 2008
Hi, i have a problem with MOSS, it seems that i need to install two different ifilters for my users to open CAD and MPG files. I've tried search for DWG ifilters that is not a trial verison, and also tried for MPG, both with no success. Any one who can help me with this problem i would greatly appreicate it. thanksAnonymous
June 25, 2008
Hi, I am trying to make an ifilter. I downloaded the WDS SDK. And i tried using the FilterSample project from Windows Search 3x SDKIndexingFilterSample. But i am getting many errors in 3 .idl files when i try to build the project. The idl files having problems are mshtml.idl, dimm.idl, mshtmhst.idl These files came with the Vista SDK. So how should i proceed? Also i wanted to know where can i get the other ifliter sample codes which are mentioned on the msdn? Thanks.Anonymous
July 06, 2008
Hiya we also use WordML format... and we are unable to index them. We have a MOSS 2007, WSS 3.0 and badly need a iFilter that handle WordML format. It was mentioned that "The workaround would be to install Office 2003 on the servers where the indexing in SQL occurs..." Will that work proper? Is there a new KIT for this problem out there that can aid us? Any third part software that can do the trick? Thanks.Anonymous
July 08, 2008
Hi, I have modi setup with moss 2007 and it's actually indexing the OCR information the correctly when it is available. This is a good thing, but what I want it to do is create OCR information when it's not available. Is that possible to do? I've set the PerformOCR registry key to 1, and restarted the search service but that did not work. Any ideas?Anonymous
November 23, 2008
Hi , I have used the MODI DLL and implemented a method in VC++ to recognize the characters in .tiff as well as .bmp files. One thing i have noticed that the OCR method of IDocument interface will fails on passing either .tiff or .bmp file with single character of size less than 72. will you please help me to fix this problem ASAP. Its Urgent.....Anonymous
November 23, 2008
Hi , I have used the MODI DLL and implemented a method in VC++ to recognize the characters in .tiff as well as .bmp files. One thing i have noticed that the OCR method of IDocument interface will fails on passing either .tiff or .bmp file with single character of size less than 72. will you please help me to fix this problem ASAP. Its Urgent.....Anonymous
November 25, 2008
Hi, has anyone managed to install the htmlprop Ifilter with search server 2008 express? I am trying to map some crawled properties to types other than strings, as an example map <meta content="28" name="age" /> to a crawled property of type int. I am trying to use the htmlprop.dll that came with the CrawlingMetadataHtmlprop sdk example. Any advice?Anonymous
December 01, 2008
The comment has been removedAnonymous
December 28, 2008
I am able to read the properties of MS Office 2007 (.docx,.xlsx) documents using IFilter (codes available at http://vbaccelerator.com/home/Resources/Babbage/NET_IFilter/IFilter.zip) but could not read any property of MS Office 2003 (*.doc) document. What should be changed in the code in order to read properties of MS office 2003 documents as well. ThanksAnonymous
December 29, 2008
While I'm not intimtely familiar with this code, it seems like we're not passing the correct IFilter INIT flags to enable property extraction. The enumeration below should be extended to support the full blown IFILTER INIT flags if we want to index all properties. http://msdn.microsoft.com/en-us/library/ms691091(VS.85).aspx private enum IFILTER_FLAGS : int { /// <summary> /// The caller should use the IPropertySetStorage and IPropertyStorage interfaces to locate additional properties. /// When this flag is set, properties available through COM enumerators should not be returned from IFilter. /// </summary> IFILTER_FLAGS_OLE_PROPERTIES = 1 }Anonymous
January 12, 2009
Thank you Deb I tried to get ifilter instance using BindIFilterFromStorage() method like below, IFilter pFilter = 0; HRESULT hr ; DWORD flags = 0; IStorage pStorage = NULL; // Open the document as an OLE compound document. hr = ::StgOpenStorageEx(filename, STGM_READ | STGM_SHARE_EXCLUSIVE, STGFMT_STORAGE, 0, NULL, 0, IID_IStorage, (void)&pStorage); if(SUCCEEDED(hr)) hr= BindIFilterFromStorage(pStorage,0,(void**)&pFilter); else return ; if (FAILED(hr)) { pFilter->Release(); throw exception("BindIFilterFromStorage() failed"); } hr = pFilter->Init(IFILTER_INIT_INDEXING_ONLY | IFILTER_INIT_APPLY_INDEX_ATTRIBUTES | IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES | IFILTER_INIT_FILTER_OWNED_VALUE_OK | IFILTER_INIT_APPLY_OTHER_ATTRIBUTES, 0, 0, &flags); if (FAILED(hr)) { pFilter->Release(); throw exception("IFilter::Init() failed"); } Start(); STAT_CHUNK stat; while (SUCCEEDED(hr = pFilter->GetChunk(&stat))) { if ((stat.flags & CHUNK_TEXT) != 0) ProcessTextChunk(pFilter, stat); if ((stat.flags & CHUNK_VALUE) != 0) ProcessValueChunk(pFilter, stat); } Finish(); pFilter->Release(); But this time also, it can not read any property of *.doc file. The flags value is always 1 after calling pFilter->Init() function. Next pFilter->GetChunk() function never returns CHUNK_VALUE. How to use the IPropertySetStorage and IPropertyStorage interfaces to locate additional properties? Thanks PrakashAnonymous
September 23, 2009
Is there a filter available for Microsoft Works file types (.wps, .xlr, and .wdb)? If not, is one planned? Thanks.Anonymous
August 03, 2010
Hi Deb, I tried emailing you via this blog's contact link, but I guess you don't check it that often :-P Do you have any examples of how to get an IFilter to return a multi-value/multivalue from IFilter::GetValue? I tried wrapping the COM values in a SAFEARRAY, but Vista's indexing service doesn't recognize it at all. I'm trying to test on Sharepoint 2010, but still struggling w/ the install for that so haven't been able to yet :-P I have put in enough instrumentation to determine that indexing service only calls ::GetValue once instead of calling it multiple times until it finds no more values, so the only other thing it can return is a SAFEARRAY. Also, are there limitations on multivalue data types? I.e., can it be a multivalue of ints, dates, etc. instead of only strings? I've found references that multivalues can be strings, but nothing else...Anonymous
August 23, 2011
The comment has been removed