Transcribo iconTranscribo 0.7.1

A general purpose plain text renderer for arbitrary input formats including frontends for reStructuredText and plain text
Transcribo is a software aimed at the development of a modular, easy to use and powerful cross-platform software to convert various file formats into accurate plain text. What might seem a somewhat strange goal in the age of pdf and HTML turns out to be very useful, e.g., for output devices which can only handle plain text such as Braille embossers. Indeed, Transcribo has been designed with the objective in mind to allow printing documents in high-quality Braille. However, Transcribo should be useful in all contexts where plain text in complex layouts is needed.

Transcribo has been designed so as to separate the processing of the input file from the actual rendering algorithm. Hence, one can speak of two layers: In the input layer various format-specific frontends parse the input streams and feed them into the renderer (second level). More specifically, frontends specific to the supported input formats.

 * parse the input file,
 * derive the layout structure and
 * call the renderer to generate
 o a proprietary, tree representation of the document, and
 o traverse the tree creating a line-by-line representation.
 * Thereafter, the renderer's paginator is called to insert white space as margins, page breaks, create headers and footers etc.
 * Finally, the paginated line-by-line representation is assembled to a plain text file.

The renderer allows to attach to each content block (paragraph, heading, reference etc.) a specific translator and wrapper to perform translations and achieve the required text outline. In combination with frontends for mark-up languages, this feature allows the user to control the output at a very high level of granularity.

Currently there are frontends for .. reStructuredText and plain text. Additional frontends for formats such as LaTeX, OpenOffice, RTF and HTML would appear useful.

Installation and usage:

2.1 General

Transcribo is developed with Python 2.6. It should run on older versions, possibly with small changes. There are no dependencies. However, if you want to use the translation features for Braille, you may wish to install a Braille translator such as liblouis or YABT. In addition, if you want to use the frontend for reStructuredText, you will need Docutils, because the frontend for reStructuredText is essentially a docutils writer component. Use the transcribo-rst.py script, a Docutils frontend tool, to generate plain text from rST documents. Without Docutils, you can only generate plain text from plain text using the transcribo-txt.py script. Type python transcribo-txt.py --help to see the command line options.

Transcribo is a pure Python package. It is installed by unpacking the archive and typing from the shell prompt something like:

cd < package dir >
python setup.py install


Then run one of the scripts in the scripts/ or test/ subdirectory (see above).

2.2 Using the rST frontend

The module transcribo.rST.py is a Docutils writer component. See the Docutils documentation for background info. It supports a reasonable subset of the rST features. Implemented features include paragraphs, sections, section numbers (basic support), bullet lists, enumerations, block quotes, line blocks, references (page references are on the wish list), strong and emphasis (represented by cappitalized letters), inline literals. To translate an rST document into plain text, use the transcribo-rst.py frontend tool. Use the command line or the configuration file to modify the page width and the translator to be used (default is None). All other configurations are contained in transcribo.renderer.styles.py.

last updated on:
October 30th, 2010, 3:07 GMT
price:
FREE!
developed by:
Dr. Leo
license type:
Other/Proprietary License
category:
ROOT \ Text Editing&Processing \ Others

FREE!

In a hurry? Add it to your Download Basket!

user rating

UNRATED
0.0/5
 

0/5

What's New in This Release:
  • unified command line front end using argparse (dependency under Python2.6)
  • new generic configuration system named yaconfig with cascading style sheets using PyYAML (new dependency)
  • supports multiple YAML files which are successively mixed into a tree of nested dictionaries
read full changelog

Add your review!

SUBMIT