A document-recognition programming framework.
The goal is to create a tool that leverages their knowledge of the target documents to create custom applications rather than attempting to meet diverse requirements with a monolithic application.
This paper gives an overview of the architecture and design principles of Gamera.
Developing recognition systems for difficult historical documents requires experimentation since the solution is often non-obvious. Therefore, Gamera's primary goal is to support an efficient test-and-refine development cycle.
Virtually every implementation detail is driven by this goal. For instance, Python [Rossum2002] was chosen as the core language because of its introspection capabilities, dynamic typing and ease of use. It has been used as a first programming language with considerable success [Berehzny2001].
C++ is used to write plugins where runtime performance is a priority, but even in that case, the Gamera plugin system is designed to make writing extensions as easy as possible. Gamera includes a full-fledged graphical user interface that provides a number of shortcuts for training, as well as inspection of the results of algorithms at every step.
By improving the ease of experimentation, we hope to put the power to develop recognition systems with those who understand the documents best. We expect at least two kinds of developers to work with the system: those with a technical background adding algorithms to the system, and those working on the higher-level aggregation of those pieces. It is important to note this distinction, since those groups represent different skill sets and requirements.
In addition to its support of test-and-refine development, Gamera also has several other advantages that are important to large-scale digitization projects in general. These are:
· Open source code and standards-compliance so that the software can interact well with other parts of a digitization framework
· Platform independence, running on a variety of operating systems including Linux, Microsoft Windows and Mac OS-X
· A workflow system to combine high-level tasks
· Batch processing
· A unit-testing framework to ensure correctness and avoid regression
· User interface components for development and classifier training
· Recognition confidence output so that collection managers can easily target documents that need correction or different recognition strategies.
Gamera has a modular plugin architecture. These modules typically perform one of five document recognition tasks:
2. Document segmentation and analysis
3. Symbol segmentation and classification
4. Syntactical or structural analysis
Each of these tasks can be arbitrarily complex, involve multiple strategies or modules, or be removed entirely depending on the specific recognition problem at hand. The actual steps that make up a complete recognition system are completely controlled by the user.
Pre-processing involves standard image-processing operations such as noise removal, blurring, de-skewing, contrast adjustment, sharpening, binarization, and morphology. Close attention to and refinement of these steps is particularly important when working with degraded historical documents.
In a hurry? Add it to your Download Basket!
What's New in version 3.2.0
- plugins to_numpy and from_numpy added for support of numpy; the deprecated numeric and numarry modules have been replaced with numpy
- highlight also works with GREYSCALE and ONEBIT images
- corrected resize function in VIGRA
- the knn classifier can now return different confidence measures for the main id that are selectable by the user. See the classifier API documentation for details.