How to Crawl the Content in .csv Files in Microsoft Office SharePoint Server 2007
SharePoint does not index the content in .csv files out of the box. Even though .csv files are simply text files, for some reason the content is not index like .txt files; thus no results are returned.
While troubleshooting an issue for a customer, I researched several hours on both our internal databases and the Internet. The only information I could find were in these blog posts which didn’t work for me.
I followed the procedures in the blogs but was still unsuccessful in getting my query returned in SharePoint. After more research, I noticed several people asking about how to do this with no answers provided.
Some of the steps involve modifying the Registry. For more information on Registry modification, please refer to KB310516. As always, be extra careful when modifying the Registry as you could cause serious damage. The Microsoft disclaimer is:
Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base:
322756 (https://support.microsoft.com/kb/322756/ ) How to back up and restore the registry in Windows
Now that we have the “Registry” scare out of the way, let’s get started.
The following steps were performed using this configuration:
· All platforms are Windows Server 2008 x64
· One Index Server with Microsoft Office SharePoint Server 2007 with SP2 and the April 09 CU
· One Query Server with Microsoft Office SharePoint Server 2007 with SP2 and the April 09 CU
· One Domain Controller with SQL Server 2008 installed
The same steps work with 32-bit systems. The only thing that will change is the operating system version of Citeknet IFilter Explorer 2.0 (Beta 2).
Determine Which .csv IFilter is Installed (optional)
If you have 32-bit servers, then download the correct version for your farm. Also, note the extensions “may” not be shown IF Office applications are not installed on the server(s). If you do not see CSV under the Extensions column, then move on to “Add the .csv Extension to the List of Searchable File Types”.
1. Download and install the complete setup of IFilterExplorer 2.0 (Beta 2) from Citeknet on the Index Server and ALL Query servers from https://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx
2. Launch IFilter Explorer and configure as follows
a. Show Extensions: With IFilters only
b. Group by: IFilters
c. Platform: your choice of x64 or x86
3. Select the OSS 2007 Search Service tab
4. Scroll down until you find “CSV” under the Extensions column
a. By default, this will be under the “C:\Program Files\Microsoft Office Servers\12.0\Bin\tquery.dll” section
5. On the Index server, the information in this row should show the following
a. Extensions: CSV
b. File Types: Microsoft Office Excel Comma Separated Values File
c. Content Types: application/vnd.ms-excel
d. Methods: File extension (override: CLSID)
e. Signed: yes
f. Persistent Handler Names: Microsoft Office SharePoint Server Plain Text filter
g. Persistent Handlers: {5e941d80-bf96-11cd-b579-08002b30bfed}
h. Persistent Handlers Addins Registered: {4a3dd7ab-0a6b-43b0-8a90-0d8b0cc36aab}
i. Persistent Handlers Addins Registered ProgId:
j. IPersistFile: (yes)
k. IPersistStream: yes
l. IPersistStorage: no
m. TreadingModule: both
i. NOTE: Item “i” is blank
6. On the Query server(s), the information in this row should show the following
a. Extensions: CSV
b. File Types: CSV File
c. Content Types:
d. Methods: File extension (override: CLSID)
e. Signed: yes
f. Persistent Handler Names: Microsoft Office SharePoint Server Plain Text filter
g. Persistent Handlers: {5e941d80-bf96-11cd-b579-08002b30bfed}
h. Persistent Handlers Addins Registered: {4a3dd7ab-0a6b-43b0-8a90-0d8b0cc36aab}
i. Persistent Handlers Addins Registered ProgId:
j. IPersistFile: (yes)
k. IPersistStream: yes
l. IPersistStorage: no
m. TreadingModule: both
i. NOTE: Item “c” and “i” are blank
7. Leave the IFilter Explorer open on all machines
Add the .csv Extension to the List of Searchable File Types
1. Launch Central Admin and go to the Shared Services Administration site
2. Under the “Search” heading, choose either “Search Administration” or “Search Settings”
a. For this, I’ll use Search Administration
3. On the Search Administration page, choose File Types under the Crawling heading
4. From the Manage File Types page, select “New File Type” and type csv in the File Extension box and click OK.
5. You will now see that the csv file type has been added
Add the Text Icon Association to csv File Types
In order to associate the text icon to csv files, modifications must be made to the docicon.xml file. This file is located in the “12 hive” on the SharePoint servers and must be modified on ALL Index and Query servers in the farm. This will associate the Excel icon with .csv files.
1. Open the docicon.xml file in Notepad. By default, the location is \Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\XML
2. Locate the “ByExtension” section in the file
3. Add the following line under the “ByExtension” section
a. <Mapping Key=”csv” Value=”ictxt.gif” OpenControl=””/>
4. Close Notepad and Save the changes
5. Repeat steps 1-4 on every SharePoint server in the farm
Modify the Registry on the Index & Query Server(s)
The screenshots may be different if you do not have the Office applications on your server(s).
1. On each of the SharePoint servers in the farm, perform the following steps
a. Go to Start, Run, and type RegEdit and press enter
b. Expand HKEY_CLASSES_ROOT and locate .csv
c. Expand .csv
d. Right-click .csv and choose Export
e. Provide a file name and note the location
i. This will back-up the .csv key which is a best practice before doing any modifications.
2. Open Notepad
3. Copy the text from the box below and paste it into Notepad
NOTE: Remember to perform this on the Index server and ALL the Query Server(s) in the farm. |
|
4. In Notepad, choose File, Save As with a File Name of csv.reg
5. Change “Save as type” to “All Files (*.*) ”
6. Save the file to the Desktop
7. Go to the Desktop, double-click csv.reg
a. This will install all three keys above to the Registry of the machine
The script above is modifies the HKEY_CLASSES_ROOT\.csv key to ensure it is configured as a text file. It also adds the .csv extension to the Microsoft Office SharePoint Server 2007 Search and Windows SharePoint Services Search Registry settings with a GUID of {4A3DD7AB-0A6B-43B0-8A90-0D8B0CC36AAB}. This GUID is associated with the text parser Ifilter.
Add the csv Extension to the ExtensionList in the Registry
On the Index servers, you will need to manually add the csv extension to the Search\Applications Registry section. This has to be performed manually since each customer may have other file extensions where a certain value may already be used. For example, if a customer has installed the PDF iFilter, then it is very possible that the value of 38 is already used for the pdf files.
1. On the Index server, go to Start, Run and type Regedit and press enter
2. Browse to the following Key
a. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<some guid>\Gather\Search\Extensions\ExtensionList
i. Replace <some guid> with the GUID’s you see in the Registry
b. Right-click ExtensionList and select New, String Value
c. Replace “New Value #1” with 38 (or the next number available) and press Enter
d. Double-click the new value from the previous step and type csv for the data and press Enter
e. Close the Registry
Verify .csv Class Changed (optional)
1. On the Index server and the Query server(s), go back to IFilter Explorer
2. Ensure the OSS 2007 Search Service tab is selected and click Refresh
a. You should that the CSV file extension is now located under “C:\Program Files\Common Files\Microsoft Shared\Filters\nlhtml.dll” and also see where the File Type has changed along with the Content Type
Reset Content Index
In order to re-index the content in csv files, the content index needs to be reset.
1. Launch Central Administration and go to the Shared Services Administration, Search Administration page
2. Under the Crawling section, click “Reset all Crawled Content”
3. Leave this screen open
Restart Services
NOTE:
Performing a net stop/start on spsearch will only affect the server(s) that have the role of Windows SharePoint Services Search.
1. On the Index Server and ALL Query Servers, go to a command prompt and type the following. Wait for each process to finish before going onto the next. These commands are restarting the IIS Service, the Office SharePoint Server Search service, and the Windows SharePoint Services Search service.
a. Iisreset /noforce
b. Net stop osearch
c. Net stop spsearch
d. Net start osearch
e. Net start spsearch
2. Continue with Step 3 after Step 1 has been completed on all Query servers and the Index server in the farm.
3. From the Search Administration page, select Content Sources, and perform a FULL Crawl of the content source containing the .csv files
4. Monitor the status through the Search Administration page until “Crawl Status” AND “Propagation Status” show as “Idle”
5. From the Search Settings page in the SSP, check “Scopes Needing Update”. If there are any scopes needing updates, select “Start Update Now”.
Test Configuration
Once the content source(s) have been fully crawled and propagated to all the Query server(s), we need to test and make sure we can search for content within .csv files.
1. Open your site collection and perform a search for csv
a. Notice in the screenshot above, there are three files that end in .csv: managers.csv, names.csv, and mscrawledproperties.csv
2. Perform a search for one of the words shown under the file name. In this example, I’ll search for contentclass with a regular user
3. Notice the results returned mscrawledproperties.csv with contentclass highlighted.
You should now be able to successfully search the contents of csv files. The only exception may be with numbers in a csv file since they normally do not have the quotation (“ “) marks around them.
Comments
- Anonymous
September 27, 2011
Very interesting, wow, a lot of work, to be sure. Are you certain that you have to de-populate the entire index just to get a new type crawled? I would have thought launching a new Full Crawl would have sufficed? Is there a technet article on this?