That’s Not Spam: False Positives and Ham
Everyone loves a good comment. Readers benefit from the shared information and authors appreciate the conversation and feedback. But you gotta keep the spam out. Akismet and other anti-spam plugins do an excellent job of automating the process, but it’s a good idea to watch out for false positives: legitimate comments marked as spam. Rescuing ham comments from the spam pile promotes healthy comment threads and improves the quality and reputation of your site. In this DiW post, we explain how WordPress & Akismet deal with spam, discuss anti-spam strategy, and share some ham-saving tips and tricks.
Know thy comments
In WordPress, there are three types of responses: comments, pingbacks, and trackbacks. The status of any given response is either:
- approved – appearing on your site
- spammed – flagged as spam
- moderated – on hold for review
- in the trash – marked for deletion
Theoretically, you’re going to know about approved comments that appear on your site. Likewise, you’ll have a chance to review any moderated comments, and nothing makes it to the trash by accident, so you know about those as well. What you don’t always know about are spam comments flagged as such by a plugin. Some of these are going to be ham, and they can be tricky to spot, especially as the number of spam comments begins to climb.
Out of the box
Out of the box, WordPress doesn’t flag any response as spam, unless you add some phrases to the built-in comment blacklist. Then, any comments matching any phrases in your blacklist are sent to the spam pile. So the key to preventing blacklist ham (mmmm..) is being absolutely sure that you want nothing to do with any comments mentioning “baby uggs” or who knows what.
Akismet & ham stats
It’s easy to stop spam without plugins, but activate Akismet and suddenly you’ve got greater accuracy, better automation, and some incredible-looking statistics. Here are Akismet stats for false positives during the last few months here at DigWP.com:
That’s good news, but don’t be fooled – the number of false positives also depends on you, the user. Seeing few false positives is good news if you’re actively looking for them, otherwise who knows how many ham comments have slipped through. We check for false positives fairly regularly, so the low numbers are great, as is the decreasing number of spam comments:
This is also a good sign, but it’s still smart to keep an eye on things and rescue as much ham as possible. Back in the day, I really got into analyzing teh spam – digging through the spam bin, looking for patterns, checking sources, and rescuing ham comments from the abyss. It’s fun if you have the time, but these days it’s better to just get it done..
Now that we’ve seen how it works, here are some clues for cleaving through large slabs of spam quickly and effectively..
- Comment text – legit comments tend to look real and stand out among the junk
- Gravatars – usually a good signal of quality, but spammers can haz gravatars too
- Link text – stupid link text is a huge giveaway, like “Baby Ugg Boots” or whatever
- Site URL – anything more than a domain or first-level subdirectory is probably spam
- Excessive links – legitimate comments rarely contain more than one or two links
Here’s a screenshot illustrating some of these aspects of spam. Of course, there are plenty more examples waiting for you in Ye Olde Spam Bin!
Those are the big giveaways, but it’s generally easier/quicker to scan for ham than spam. That is, rather than looking for evidence of spam, scan for signs of legitimacy and quality. So a good example would be scanning for gravatars – you’re not trying to find the grey mystery man icon, you’re looking for something original, like the flag icon in the previous screenshot. With some repetition, the visual clues sort of gel together and the ham just sort of jumps out at you as sift through the pile.
Wrapping it up..
So what did we learn? Spam is the bad stuff, ham is the good stuff. WordPress doesn’t flag anything as spam by itself unless you add phrases to the comment blacklist. Add a great anti-spam plugin such as Akismet to the mix, and you’ve made your life easier by automating the process. But if you care about your readers and their feedback, you should periodically scan through your spam comments and rescue any false positives. With some repetition, checking your spam and saving ham comments takes only a few minutes, improves the quality of your site, and keeps commentators happy and ready for more.
Why is Conditional CAPTCHA for WordPress rarely mentioned when someone’s trying to tackle the false-positives problem? It’s a really good plugin that deserves more mentions!
Basically, it allows commenters to rescue their own spam-flagged comments by displaying a CAPTCHA form on the next page, after the comment is submitted—and this happens only when Akismet marks a comment as spam. Otherwise, the comment goes through safely without the user having to deal with any CAPTCHA…
Wow, I had no idea.. Thanks for bringing that to our attention – sounds like a good solution for preventing false-positives. And it works with Akismet? Even better.
Yeah, it works with Akismet (and a couple other spam protection plugins that have integration API)!. I’m glad you like the plugin!
That’s great! This is going to be incredibly handy for me. Can’t believe I didn’t know about it. Thanks man!
Well, it’s not earth-shattering news or anything, but glad if it helps out!