Getting CSV Files Indexed and Searchable in SPS 2003
Note: This post applies primarily to SPS 2003, but the solution for this problem in MOSS 2007 is also very similar. I might write another post for this problem for MOSS 2007 in the future.
Introduction
SPS 2003 crawls text files with the .txt extension. Consequently, they also appear in the search results when a user searches for a txt file. If however, you have a text file that you can create and read in Notepad, but has a different extension then .txt, SPS 2003 and even MOSS 2007 will not recognize that this is a text file and will not index the file. As a result, if such file exists in your portal, and someone searches for the content inside the file, the file will not be displayed in the search results. This article explains how you can get your "non .txt. extension" text files (such as files with a csv extension) to be indexed in SPS 2003.
How Files Get Indexed
Before I talk about the solution to this problem, I would just like to quickly explain a little about the search architecture of SharePoint (applies to both SPS 2003 and MOSS 2007. Note that this explanation is high level, just to aid you in understanding how this solution works. When the SharePoint indexer is at work, and comes across a file in the content source, it looks at the list of files that are specified as "searchable" by the administrator (as an administrator, you can define a list of searchable file types from Search Administration). If the type of file has been defined in the list of searchable file types, the indexer will then attempt to find an "IFilter" for that type of file. The "IFILTER" is a software componant (a dll or an assemly) that tells the SharePoint indexer about how to index the file. The IFILTER will guide the indexer about the format of the file primarily. Once the IFILTER for the file extension has been found, the SharePoint indexer performs the indexing and once this is complete, the file will start appearing in search results. Based on the above explanation, a file will not be indexed and searchable if either of the following is true:
- The file has not been defined in the list of searchable file types by the administrator, OR
- A corresponding IFILTER has not been installed on the Index Server.
Getting .CSV Files Indexed in SPS 2003
By Default, SPS 2003 or MOSS 2007 does not index .CSV files. Although it does index ,TXT, files. CSV files, as you know are basucally text files, but just don't have the .txt extension. In order to make these files searchable in SPS 2003, you will have to do two things:
1. Adding the .CSV extension to the list of searchable file types.
2. Installing an IFILTER for the .CSV files.
Both of these steps have been described in more detail below:
1. Adding the .CSV extension to the list of searchable file types
This can be done by completing the following steps:
- On the top level site, click on Site Settings
- Under the 'Search Settings and Indexed Content' section, click on 'Configure search and indexing'
- Under 'General Content Settings and Indexing Status' section, click on 'Include File Types'
- Click on 'New File Type'
- Write "csv" in the "File Name Extension" text box and click OK.
2. Installing IFILTER for CSV Files
As you might recall from my explanation above, an IFILTER tells the indexer about the format of the file as in how to read the file and get it ready for indexing. In this step, we need to install an IFILTER for CSV files. But hey wait a minute! The format of CSV files is the same as TXT files right? Just the extension is different yeah? And since.TXT files are searchable on the portal, this tells me that there is an IFILTER already installed for TXT files!!! So here's the trick, instead of creating and writing our own IFILTER, we're gonna tell the SharePoint indexer to use the same IFILTER that it uses for indexing TXT files. This can be done by completing the following steps:
On the SharePoint Index Server, open the system registry by going to Start --> Run, typing in regedit and pressing enter.
Go to HKEY_LOCAL_MACHINE --> SOFTWARE --> Microsoft --> SPSSearch --> ContentIndexCommon --> Filters --> Extension --> .txt. This is the IFILTER for txt files. It contains the CLASS ID of the software component that contains the implementation of the IFILTER.
Copy the default value of the key. This is a GUID.
Create a new registry key named ".csv" under HKEY_LOCAL_MACHINE --> SOFTWARE --> Microsoft --> SPSSearch --> ContentIndexCommon --> Filters --> Extension.
In the default value of this newly created key, paste the GUID that you copied in step 3.
Close the registry.
Conclusion
At this point, all the necessary configurations have been completed, go ahead and restart the search services and perform a full crawl of your content sources containing csv files. Once the indexing is complete, you'll see that full text searching has now been enabled on CSV files as well.
Comments
Anonymous
June 12, 2009
Did you ever get around to writing up this process for MOSS 2007? I tried approximately what you suggested in MOSS 2007 but without much luck. Was wondering if you ever were successful with it. Thanks for any guidance you can provide. Best Regards, LauraAnonymous
June 12, 2009
Hi Laura, Yes, I am sure that this works with MOSS 2007. Infact, that's the first place where I saw it working. You may have to modify some different registries though. I am sure you may have been missing something. I willl try to write a MOSS counterpart of this article sometime soon.