A cross-platform and Open Source CLI utility for extracting plain text from many documents

SILVERCODERS DocToText is an open source, multi-platform, free and powerful command-line utility that allows you to effortlessly convert a single or multiple documents, in different file formats, to the Plain Text format.

Supports numerous file formats

The application supports numerous file formats, including Microsoft Word (DOC, DOCX), Microsoft Excel (XLS, XLSX, XLSB), Microsoft PowerPoint (PPT, PPTX), Rich Text Format (RTF), OpenDocument, OASIS text documents (ODT), MSOOXML or OpenXML, XML (OOXML), OASIS spreadsheets (ODS), OASIS presentations (ODP).

In addition, the OASIS graphics (ODG), iWork formats (NUMBERS, PAGES, KEYNOTE), OpenDocument Flat XML formats (FODS, FODP, FODT), Email files (EML), HyperText Markup Language (HTML) and Portable Document Format (PDF) are also supported by SILVERCODERS DocToText.

Command-line options

As mentioned, this is a command-line utility, which means that you can’t interact with it through a pretty graphical user interface (GUI), but only via an X11 terminal emulator. Type the “sh” command, after you’ve extracted the binary archive that corresponds to your computer’s hardware architecture, to view its command-line options.

From there, the user can try to parse the file that he/she tries to convert as RTF, ODF, OOXML, XLS, XLSB, iWork, PPT, DOC, HTML, PDF, EML or ODFXML documents first, fix corrupted XML files, strip XML tags instead of parsing them, use a specific command to unzip files from archives, instead of using the built-in decompression utility, as well as to write logs to a specified file.

Supported operating systems and platforms

SILVERCODERS DocToText has been designed from the offset as a cross-platform software written in the UNIX Shell programming language, which means that it has been successfully tested with some of the most popular GNU/Linux distributions, as well as with the Microsoft Windows and Mac OS X operating systems. Both 64-bit and 32-bit hardware platforms are supported at this time.

Last updated on October 10th, 2014
