catdvi is a DVI to text/plain translator.
catdvi is a program that translates TeX Device Independent (DVI) files into readable plain text. The program is under development. It produces satisfactory results in many cases, but still has some issues with complicated input.
Actually, "translate to plain text" can mean several different things, depending on the intended use:
· Output formatted text that resembles the layout of the DVI file as closely as possible, suitable for e.g. preview on a character cell terminal or printing on a teletype style printer.
· Output unformatted text in "read order". (Rather than "print order", which makes quite a difference with e.g. multi-column page layouts). Useful for searching, indexing and other kinds of postprocessing, and maybe also for export to different text processors.
· Output (not completely plain) text in read order with the formatting distilled into some kind of markup so that paragraph breaks, subscripts, superscripts, etc. can still be recognized. This functionality is essentially a (La-)TeX decompiler, useful for recovery of lost or otherwise unavailable .tex files.
catdvi's principal target is to create human-readable text files from DVI input, and hence the first kind of translation.
The second kind is supported as well because one of the developers needed it and it could be obtained as an easy by-product (based on the mostly true assumption that read order = order in the source file = order in the DVI file).
The third kind of translation is the most difficult one to achieve since a DVI file does not contain logical markup information. The structure of the text has to be guessed from heuristic principles and an analysis of certain characteristics of TeX's output. No attempt in this direction has been made so far. But knowledge of some aspects of text structure would also help to improve the quality of layout in case 1. If it turns out these can reliably be guessed, an option to show them as markup will probably follow. This feature has low priority at the moment, especially since nobody has expressed a need for it.
· You need a hosted ISO C (1990) environment and the Kpathsea library (included with e.g. teTeX) to compile this program. GNU Make makes the compilation pleasant, but is not required. TeX font metric (.tfm) files for the fonts used in your DVI files have to be present at run time.
· The program should be very portable. It is expected and intended that it will work on almost every system where an ISO C compiler and a port of the Kpathsea library are available. This includes most UNIX-like systems and many others.
· Where possible, the code aims at ISO C compliance and as few assumptions about the working environment as possible are made. Searching for .tfm files in the file system is an inherently system-dependent activity and is currently done with help of the Kpathsea library. Non-kpathsea implementations of that functionality will be accepted if somebody codes them. The most notable known portability problem in other parts of the program is the assumption that CHAR_BIT equals 8; however, this assumption seems safe among contemporary platforms.
· Development is done under GNU/Linux on x86. Additionally, different versions of catdvi have been verified to compile and work under GNU/Linux on Alpha, PPC and UltraSparc architectures, FreeBSD 4.3 on x86, Mac OS X on PPC, and AIX 4.2 on RS6000. If you have catdvi working on another platform, please send a note to the catdvi-misc mailing list (you need not be subscribed to do this). If the program does not work on your system, then please send a note as well so that the problem can be fixed.
Usage: ./catdvi [options] [file]
-d DEBUGLEVEL, --debug=DEBUGLEVEL
Set the debug output level. Smaller is less output.
-e ENCODING, --output-encoding=ENCODING
Set the output encoding.
(ENCODING can be a number or name from the table below.)
-p PAGESPEC, --first-page=PAGESPEC
Do not output pages before page PAGESPEC.
PAGESPEC is either count0, =physicalpage or chapter:count0 .
-l PAGESPEC, --last-page=PAGESPEC
Do not output pages after page PAGESPEC.
-N , --list-page-numbers
Output physical page count, count0 value and chapter count instead
of page contents.
Show the Unicode number of unknown glyphs instead of `?'.
Do not attempt to reproduce the page layout; output glyphs
in the order they appear in the DVI file.
Show this help page.
Show version information.
Show copyright information.
The following output encodings are available: