Cuneiform 1.0

A multi-language OCR system originally developed and open sourced by Cognitive Technologies.

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in Cuneiform 0.8:

  • Bugs were fixed.
  • Cygwin is supported.
  • A single column recognition mode was added.
Read full changelog
send us
an update
BSD License 
3.5/5 2
Jussi Pakkanen
ROOT \ Text Editing&Processing \ Others
Cuneiform is a multi-language OCR system originally developed and open sourced by Cognitive Technologies. It was originally a Windows application, which was ported to Linux by Jussi Pakkanen.


Extract the source and go to the root folder (the one this file is in).
Then type the following commands:

mkdir builddir
cd builddir
cmake -DCMAKE_BUILD_TYPE=debug ..
make install

By default Cuneiform installs to /usr/local. You can specify a different prefix by giving a command line switch "-DCMAKE_INSTALL_PREFIX=/what/ever/yo /want" to CMake.

If you have ImageMagick++ on your system, Cuneiform autodetects and builds against it. Then Cuneiform can process any image that ImageMagick knows how to open. Otherwise it can only read uncompressed BMP images.

If you want to run Cuneiform without installing it on your system, you have to point the CF_DATADIR environment variable to a directory containing the .dat files. These can be found in the "datafiles" directory of the source package.


After install you simply run.

cuneiform [-l language -o result_file --html --dotmatrix --fax] < image_file >

Output is written to pumaout.txt. Cuneiform assumes that your image contains only a single column of text.

By default Cuneiform recognizes English text. To change the language use the command line switch -l followed by your language string. To get a list of supported languages type "cuneiform -l".

By default Cuneiform outputs plain text. You can specify the "--html" switch to make it output in HTML format.

If you do not define an output file with the -o switch, Cuneiform writes the result to a file "cuneiform-out.[format]". The file extension is either "txt" or "html" depending on your output format.

Last updated on July 1st, 2010

#OCR system #optical character recognition #optical #character #recognition #OCR

Add your review! 1 USER REVIEW SO FAR