I woke up a little earlier than usual this morning (yes yes, I know some of you are already at work as early as 7am, but I don’t normally even roll out of bed until 745 ok?), and decided that it would only be proper to do something productive with my extra time.
So instead I went about cleaning out the comment-spam on this blog.
It turns out that, even though this wasn’t the most productive way to spend the first hour of the work-week, it was actually a fairly interesting endeavour. I don’t have a whole lot of experience with blog spam, so I was a bit surprised to find that there were about 77,000+ records in my wp_comments database table, upon SSHing into my remote box. A few queries later I discovered that about 65,000+ of those had already been helpfully marked "Unapproved", so it was a simple matter of "DELETE FROM wp_comments WHERE approved=’0′;" to get rid of all 5 dozen thousand of them.
Which of course still left me about 12,000 comments. By my own modest estimates, this blog only has about 1500 real comments, so theoretically I should be able to prune away another 10,500 records right? In practice, that turned out to be a little more complicated than I thought, as there’s no simple way to select 10,500 records if you don’t know what their common attributes are (in case you guys are wondering, these are the spam that got passed the Akismet filter, so they’re slightly harder to heuristically track). So I started selecting random comments 30 records at a time to get an idea of what kind of text I was actually getting hit with. Interestingly, a good 50% of them were variations of:
- Good Site . Nice Work. (2899 comments)
- Very good site (1131)
- Cool site (1164)
- Good site (1139)
- Nice site (1061)
- Good suite (10)
Apparently, even the spam bots think this is an above average blog. I really should be flattered. The query I used to delete these comments used a simple LIKE-based condition: DELETE FROM wp_comments WHERE comment_content LIKE "%Good Site%"; (with the actual text between the "%" being swapped for the relevant wording)
A couple of medically-related words like "phentermine", "tramadol", "viagra" and "xenical" (!) yielded another 500 matches or so. My wp_comments table is now done to about 4,500 records, just 3,000 shy of my original estimate. (Obviously, that’s still off by about 200%, but considering that I pruned away 73,000 comments in the space of about a dozen sql commands, it’s reasonably decent.)