Partager via


Scanning in DjVu

My new project has a requirement to scan the colour documents in DjVu format, so I thought of writing about this somewhat unfamiliar file format.

Have you heard of “Deja vu”.? As i understood in French this means something like “familiar” or “already experienced”. This is used to explain the weird feeling that most of us have experienced, where we come across a new situation or a person and we feel like it has happened before, although we cannot recall the exact situation. Thinking There could be several religious interpretations on this, but as I know there is no accepted scientific explanation on this yet. (at least I couldn't find any).

I don't know why they have used the same name, but DjVu is a file format similar to PDF, which is significantly small in size. This has been developed by AT&T and later the commercial rights have been transferred to lizard tech. Last year again it was transferred to Celartem Technology, the parent company of Lizard Tech. However DjVu is a free file format which means the specifications and the reference libraries are freely available. Similar to PDF, any user can view a DjVu document by installing a browser plug-in which is available freely. The commercial ownership is only on the encoding technology.

Below are some interesting comparisons from DjVu.org. (I am yet to test these in practice)

  • Scanned pages at 300 DPI in full color can be compressed down to 30 to 100KB files from 25MB.
  • Black-and-white pages at 300 DPI typically occupy 5 to 30KB when compressed
  • For color document images that contain both text and pictures, DjVu files are typically 5 to 10 times smaller than JPEG at similar quality.
  • For black-and-white pages, DjVu files are typically 10 to 20 times smaller than JPEG and five times smaller than GIF.
  • DjVu files are also about 3 to 8 times smaller than black and white PDF files produced from scanned documents

This is a graphical comparison done by Lizard Tech;

There are several important technologies being used in DjVu that makes it possible to have very clear images in such small file sizes. First is the compression technology that is being used. Unlike other compressions, in DjVu a file is compressed as 3 images namely the foreground image, background image and the mask image. The mask image which is in high resolution is used to store the text layer and uses a special compression technique. It compresses a particular character only once. And instead of recording all other occurrences of the same character it records only the location of subsequent occurrences. The other two image layers are stored in colour in low resolution. Due to this high compression technology a DjVu file with lot of text is significantly lower in size than a similar file in PDF. Also the decompression of a DjVu file is done in several steps. So the user will have an initial view very quickly and after few moments only the full quality image is displayed.

These features make DjVu an ideal format for scanning colour text documents for electronic distribution. Who knows, DjVu may even replace PDF files Surprised especially when it comes to scanned colour documents such as text books. The famous million book collection is an example of using DJVU format extensively. They offer more than 1. 5 million full text books freely in the open formats such as HTML, TIFF and DJVU.

Some other useful links;