My Pictures from WordCampSF 2010 in San Francisco

Sam Bazzi & Matt Mullenweg Matt Mullenweg WordCampSF Attendees Matt Mullenweg with WordCampSF Attendees WordCampSF Attendees Richard Stallman as Saint IGNUcius Rinat Tuhvatshin Addressing WordCampSF Attendees Matt Mullenweg Raphael Mudge @ WordCampSF 2010 Sam Bazzi & Richard Stallman

I attended WordCampSF, the WordPress conference, on Saturday, May 1, 2010, in SanFrancisco, California. It was a great opportunity to meet numerous professionals from the Web publishing and software development industries, including Matt Mullenweg, Richard Stallman, and others.

How to Detect and Block Abusive Web Crawlers

SaM BaZzI

By SaM BaZzI, technologist

Ever wondered how to identify the IPs with the most hits on your web server (or, in other words, website)? Perhaps you want to identify the most active human users of your website(s) or abusive web robots. In all cases, the answer, of course, lies in your web server access log file! Here are the Linux/Unix commands I have been using to periodically detect digital culprits or enthusiastic users:

> cat <access-log-filename> | cut -d" " -f1 | sort -n | uniq -c | sort -rn | head -n 10

These piped Linux/Unix commands provide me with a sorted list of IPs with the most hits as registered in the access log file. I can then run the whois command on the IPs to determine whether they are legitimate visitors (e.g., Google's robots) or not (unwanted crawlers). To block offending IPs, you can use the iptables command:

> iptables -I INPUT -j DROP -s <ip-address>

Enjoy!