I attended WordCampSF, the WordPress conference, on Saturday, May 1, 2010, in SanFrancisco, California. It was a great opportunity to meet numerous professionals from the Web publishing and software development industries, including Matt Mullenweg, Richard Stallman, and others.
Monthly Archives: May 2010
How to Detect and Block Abusive Web Crawlers
Ever wondered how to identify the IPs with the most hits on your web server (or, in other words, website)? Perhaps you want to identify the most active human users of your website(s) or abusive web robots. In all cases, the answer, of course, lies in your web server access log file! Here are the Linux/Unix commands I have been using to periodically detect digital culprits or enthusiastic users:
> cat
<access-log-filename> | cut -d" " -f1 | sort -n | uniq -c | sort -rn | head -n 10
> iptables -I INPUT -j DROP -s <ip-address>










