Google has recently re-released the Tesseract OCR software to the open source community. OCR or optical character recognition is a sophisticated technique that helps digitally converting physical text into computer based text. Physical text is passe. With the OCR software you can now store a bulk of your earlier papers in digital formats.
Google has also reported that they are not the original developer of the OCR software. This particular Tesseract OCR software was originally developed at the Hewlett Packard Laboratories during 1985 – 1995. But unfortunately HP got out of the Tesseract OCR software business and the software was unused till Google's recent re-launch of the software.
The Tesseract OCR software supports only one language, i.e. English. The software may not include a page layout analysis module but it's far more accurate than any Open Source OCR package available in the market.