DupeFinder is a simple application for locating, moving, renaming, and deleting duplicate files in a directory structure.
It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space, and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.
Here are some key features of "DupeFinder":
Although DupeFinder is a quite small application, it should have all of the features you will need to remove and reorganize large directories full of duplicate files:
· Well designed graphical interface with full tooltip and "What's This?" question button support, useful in an application which you probably won't need to use frequently
· Quick processing by eliminating analysis of unwanted data through file extension filtering
· View files in external applications by double-clicking
· Rename files in place or move to new locations
· Default settings disallow deletion of all copies of duplicate files to prevent accidental data loss
· Generate simple reports identifying groups of duplicate files for later processing
While everything works pretty well in most cases, there are a few issues with DupeFinder to be aware of. I hope to fix most of the following bugs sometime soon:
· May crash if files containing "~" or ":" characters are encountered
· May crash if self referencing symlinks are encountered
· Zero byte files cannot be deleted
· May not be able to delete files with Unicode characters in filename
· Display does not update if identified duplicates are moved, renamed or modified external to DupeFinder
· DupeFinder is built on two primary tools: the Python language and the Qt application toolkit. A Python interpreter and the Qt libraries are included in most desktop Linux, BSD and UNIX distributions. Mac OS X (at least the newer versions) includes Python, and Qt is also available for free, though it is not part of a standard install.
· Qt is primarily a C++ toolkit, so this means that the PyQt Qt bindings for Python are also required. These are not standard on many/most Linux, etc. distributions, though they are available for all of the systems mentioned.
· Finally, the md5sum utility must be available. This utility is standard on Linux and similar systems, though I've read on Mac OS X it goes by the name md5 instead. I have not confirmed this, but if so then simply change the single occurrence of md5sum in FindDupFiles.py to md5 to run the app on a Mac. Later versions of DupeFinder may use built in code to calculate md5 sums to eliminate this requirement.
· Running DupeFinder on Windows should be possible but probably isn't worth the effort, unless most of the components are already in place for other applications. Qt and PyQt for Windows are only available with a commercial license (this will change when Qt 4 is released). Python is a separate install. An md5sum utility is needed (one does appear to be available from ActiveState). Alternatively it is probably possible to satisfy all of the dependencies through X11 on Cygwin.
· One more thing: although DupeFinder is intended to be run graphically and interactively, the FindDupFiles.py script can be run standalone from the console. It takes a root search directory followed by any number of file extension filters as command line arguments and outputs the identified duplicate file groups (in no particular order) to STDOUT. This output can be piped to a pager such as less for immediate inspection or redirected straight to a text file using the ">" shell operator (on UNIX-like systems) for logging/reporting.
What's New in This Release:
· This release replaces usage of an external md5sum command line utility with native MD5 sum digest calculations (using the md5 module).
· This improves performance calculating MD5 digests for small files and eliminates a cumbersome dependency for Windows users.