bot-trap allows your Web site to automatically ban bad Web robots (a.k.a. Web spiders) that ignore the robots.txt file.
This does not include Googlebot and other well-behaved robots.
The main advantage over other implementations of this concept is that bot-trap has a manual "unban" feature so that humans can unban, but robots can't.
How It Works:
- You place a small "web-bug" strategically in your web pages. This bug is just a tiny image link that says to go to /bot-trap/index.php. Normal people don't see this link, but web bots do.
- You create a /robots.txt file that tells web bots not to go to the /bot-trap directory.
When the bad robot visits /bot-trap/index.php anyway, /bot-trap/index.php adds the IP address of the bad bot to a block list in /.htaccess. They are blocked from access to the site from then on. You can also be emailed when this happens.
It is possible that someone is banned who shouldn't be. Perhaps a previous user of an IP address in a DHCP pool was a naughty user and ran a bad bot, but now the new user is banned. Not to worry, the custom "403 Forbidden" page allows any user to unban themselves by typing a requested word into a form box. Real people can easily do this, but bots can't!
1. Unpack the tarball in your web page root directory:
# tar -xzf bot-trap-x.x.tar.gz
2. Either add a line to your root .htaccess file like:
ErrorDocument 403 /bot-trap/forbid.php
or copy the premade one (bot-trap/htaccess-root-example). Notice that since once an IP is banned, it can't access anything in /, so the 403 page should be in /bot-trap, and /bot-trap/.htaccess should only say "Allow from all". Look at the forbid.php file in the distribution to see how to do this, or just use it as-is.
3. Make sure .htaccess controls are allowed in your Apache configuration (especially the "AllowOverride" directive). This allows bot-trap to ban IP addresses using the htaccess mechanism.
4. Create the empty file blacklist.dat in your web root directory, and make blacklist.dat, .htaccess, and the bot-trap directory in your web root directory owned by the www user with write permission. If web server uses a group (like the group "www-data" on Debian GNU/Linux), set these files and directories group-writable.
5. Edit bot-trap/settings.php to hold the correct email addresses to send alerts to.
6. Add "web-bugs" to your main web page to catch the bad bots. This is the XHTML code:
< !-- Bad robot trap: Don't go here or your IP will be banned! -->
< a href="/bot-trap/">< img src="bot-trap/pixel.gif" border="0"
alt=" " width="1" height="1"/>< /a>
7. Add the bot-trap directory to your robots.txt file, or copy the example robots.txt file (bot-trap/robots.txt.example) to the root directory.
8. Make sure /.htaccess and all other files have the correct permissions and ownership for your site.