Индексирование TIFF-файлов

Я уже писал про IFilter-ы для SharePoint (Индексирование DjVu-документов и Индексирование pdf файлов в SharePoint). Сегодня обнаружил еще один мега полезный фильтр для TIFF-документов – Captaris TIFF iFilter.

Описано очень заманчиво:

Why Use Captaris TIFF iFilter?

Support Enterprise Search

Add full-text search and retrieval of image documents alongside Microsoft Office and other electronic documents.

Enable E-Discovery.

Search images along with electronic documents to speed up E-discovery tasks and find more business-critical information.

Unique Features

Recognition server automatically performs full-page recognition of TIFF files and sends the information to Search Server 2008 for indexing.

Pre-OCR image cleanup Includes auto-rotation of inverted pages; de-skewing of scanned pages.

Off-center correction and de-speckling to remove black dots caused by fax noise or poor scan quality. Punch hole removal, de-shading and inverse text correction. Image enhancement produces higher accuracy of resulting text sent to the indexing server. The stored image is maintained in its original state.

How Does It Work?

The IFilter interface scans documents for text and properties (also called attributes).

It extracts chunks of text from these documents, filtering out embedded formatting and retaining information about the position of the text.

It also extracts chunks of values, which are properties of an entire document or of well-defined parts of a document.

IFilter provides the foundation for building higher-level applications such as document indexers and application-independent viewers.

Очень бы хотелось проверить как оно работает с русским языком, но времени на это нет :( Если кто попробует использовать – напишите плиз свои впечатления!

Запись опубликована Техноблог Войцеховского Максима. Пожалуйста, оставляйте комментарии там.

Last updated on 2008-10-05

Индексирование TIFF-файлов

Additional resources