Cuneiform

1.0 BSD License    
3.5/5 2

  3,510 downloads

A multi-language OCR system originally developed and open sourced by Cognitive Technologies.

description

download

specs

changelog

1 comment

Cuneiform is a multi-language OCR system originally developed and open sourced by Cognitive Technologies. It was originally a Windows application, which was ported to Linux by Jussi Pakkanen.

Compiling


Extract the source and go to the root folder (the one this file is in).
Then type the following commands:

mkdir builddir
cd builddir
cmake -DCMAKE_BUILD_TYPE=debug ..
make
make install


By default Cuneiform installs to /usr/local. You can specify a different prefix by giving a command line switch "-DCMAKE_INSTALL_PREFIX=/what/ever/yo /want" to CMake.

If you have ImageMagick++ on your system, Cuneiform autodetects and builds against it. Then Cuneiform can process any image that ImageMagick knows how to open. Otherwise it can only read uncompressed BMP images.

If you want to run Cuneiform without installing it on your system, you have to point the CF_DATADIR environment variable to a directory containing the .dat files. These can be found in the "datafiles" directory of the source package.

Running

After install you simply run.

cuneiform [-l language -o result_file --html --dotmatrix --fax] < image_file >


Output is written to pumaout.txt. Cuneiform assumes that your image contains only a single column of text.

By default Cuneiform recognizes English text. To change the language use the command line switch -l followed by your language string. To get a list of supported languages type "cuneiform -l".

By default Cuneiform outputs plain text. You can specify the "--html" switch to make it output in HTML format.

If you do not define an output file with the -o switch, Cuneiform writes the result to a file "cuneiform-out.[format]". The file extension is either "txt" or "html" depending on your output format.
read more   
Last updated on July 1st, 2010

1 User review so far. Load top Load all

SUBMIT

#OCR system #optical character recognition #optical #character #recognition #OCR