Tesseract OCR For Linux

2.6/5 20

Last updated: Oct 4, 2010 The Apache License 2.0

SOFTPEDIA® DOWNLOAD NOW 6,784 downloads so far

Tesseract OCR is a commercial quality OCR engine originally developed at HP between 1985 and 1995.. #OCR engine #Tiff reader #Read color image #Tesseract #OCR #Engine

Description

Tesseract OCR is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.

The developers are regularly testing on the following platforms:

� Ubuntu 6.06 (x86/32, x86/64) � Ubuntu 6.10 (x86/32, x86/64) � Windows (x86/32)

We believe that the code should be running on these other platforms, but we don't have the resources to test on them regularly:

� recent Linux distributions (x86/32, x86/64) � Mac OS X (x86, PPC)

If you're interested in supporting in supporting other platforms or languages, please get in touch with Ray Smith.

What's new in Tesseract OCR 3.0:

Preparations for thread safety:
Changed TessBaseAPI methods to be non-static
Created a class hierarchy for the directories to hold instance data, and began moving code into the classes.
Moved thresholding code to a separate class.

Read the full changelog

DOWNLOAD Tesseract OCR 3.0