The Revisionist 0.02b

The Revisionist is a tool for extracting and indexing hidden metadata.

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!
send us
an update
LGPL (GNU Lesser General Public License) 
The Evil Twin
ROOT \ Internet \ HTTP (WWW)
The Revisionist is a tool for extracting and indexing hidden metadata (such as deleted or modified text) from large collections of MS Word files.

It can operate whole Web sites or SMB or NFS directories. The Revisionist project is handy for pen-testing, or it can be used just to spot embarrassing secrets.

My primary goal is to provide pen-testers and content administrators with a handy tool to detect hidden data in all documents available at a specific location (be it a locally mounted network share, a HTTP site, or whatnot), and easily review it all.

Right now, the tool only detects and indexes deleted text in documents with "change tracking" enabled, and can also index usernames and hardware addresses embedded in documents (to facilitate external assessment of company structure); future versions should be able to recover other goodies, too.


To run the tool against a local directory, a mounted SMB or NFS directory, or such, simply issue the following command (after doing 'make', that is):

./therev '' @/path/to/directory

After the tool completes, you should be able to view 'master.html' in current directory using your favourite browser (Lynx, Netscape, etc). Cached copies of documents would be placed in subdirectories named document.XXXXXX, where X is a random digit; hence, it is recommended to run the tool in a separate directory.

Note that you may also instruct the tool to look for specific substring and only choose those documents that contain it (strict checking, no regexp available):

./therev 'linux' @/path/to/directory

To run the program against a specific site or top-level domain, do the following:

./therev ''

Note that 'com', 'gov', '', '' are all a valid site name. The first parameter works similar to the previous case:

./therev 'homeland security' gov

As a special bonus, when running the script against multilinguinal sites, you might want to specify a third parameter - desired language (using a two-letter code: en, pl, etc). NOTE: DO NOT USE LANGUAGE QUALIFIER UNLESS NECESSARY:

./therev 'linux' en

The HTTP search mode uses to locate all matching Word documents on a specific site. For a document to be found, it must be indexable (that is, not excluded in robots.txt) and be in the first 1000 of results for a specific site. If there are more than 1000 documents at some website, consider sub-searches with keywords.

What's New in This Release:

This release was fixed to work with the new Google page layout.
Some other minor fixes were made.

Last updated on January 30th, 2006

#extracting hidden metadata #indexing hidden metadata #words indexing #extracting #indexing #hidden #metadata

Add your review!