OCR using Tesseract is a servicemenu for Dolphin and Konqueror, compatible with KDE4, that will give you a possibility to OCR documents conveniently in your file manager window.

This is a very simple program. It OCR\'s a document and puts it into a file that has the same name as the OCRed image file but with a txt extension.

For the menu to be visible and have basic functionality (OCR tif files) you have to have tesseract-ocr installed and in your path, as well as the desired language packages. (The menu is tested against tesseract-ocr v. 2.03 and 2.04).

To be able to OCR png and jpeg images you have to have imagemagick installed. To be able to OCR pdf file you have to have ghostscript installed.

see file readme.txt in the archive.


– The menu cannot handle filenames with spaces (though it tolerates directory names with spaces). No warning is given.

– If the working directory contains a file with a name of the file to be OCRed, that has an extension "tif" or "txt", it will be overwritten or deleted (e.g., if the file to be OCRed is named foobar.tif, foobar.txt will be overwritten; in case of foobar.tiff or foobar.png or foobar.jpg, foobar.tif will be deleted and foobar.txt – overwritten. No warning is given.

– Uppercase extensions (like JPG or PNG) are not supported, and produce a warning that the script does not handle these types of files. Also the long jpg extension "jpeg" is not supported...

I am afraid I will not be spending more time on this menu to solve these problems by myself (I have already surpassed myself in bash when doing this script already), but I will gladly incorporate the patches anyone sends me or posts here.
Last updated on September 28th, 2010
