The Methabot software is a speed-optimized, scriptable and highly configurable web, ftp and local file system crawler. It supports scripted filetype parsing, a wide variety of customization options and is easily configured to fit anyones particular needs.
With the use of the module system and scripting language, users are able to take full or partial control of the crawling process and decide however Methabot should store web data, statistics and much more.
Just by running Methabot from command line you are able configure custom filetypes, filtering expressions, behaviour, and much more, so you don't have to be a scripter!
Here are some key features of "Methabot":
· It's fast, designed from the ground and up with speed-optimization in mind.
· User-defined filetype filtering (according to MIME type, file extension or UMEX expression)
· Highly configurable from command line
· Extensible module system, supporting custom data parsers and filters.
· Simple yet powerful filtering of URLs through UMEX.
· Automated downloading
· Support for automatic cookie handling when running over HTTP
· Reliable, fault-tolerant networking
· Portable, tested with success on 32-bit/64-bit Linux 2.6, 32-bit/64-bit FreeBSD 6.x/7.0, Windows XP and Mac OS X. Should work on almost any Unix-like OS.
· SpiderMonkey headers
· curl and libcurl
What's New in This Release: [ read full changelog ]
· Bugfix, when external-peek was used the depth limit was messed up.
· Memory usage cleanup fixes
· dynamic-url option is no longer set to lookup by default, since it slows down the crawling significantly
· Build system now creates and installs some header files that modules can use when linking
· metha-config tool added
· lmm_mysql moved outside of this package