January 7th, 2008

The Scourge Of Blog Comment Spam

by

Let me begin with an open appeal to Akismet, provider of comment spam protection to Publishing 2.0 and many other blogs run on WordPress: Howard Owens is NOT spam

Every time Howard Owens leaves a comment on Publishing 2.0, it gets caught in the WordPress Akismet spam filter. Howard tells me this happens to him on most other blogs. Why? Not because Howard is spammy — he leaves great comments, which is why I’m always happy to fish them out of the unspeakable bucket of filth (more on this in moment) caught by Akismet.

It’s likely because Howard’s blog was hacked by spammers. Not once, but twice. So when Howard enters his blog URL in the comment form, it triggers the spam filter.

Why would a spammer want to hack Howard’s blog — or any blog?

Ah, that gets to the reason why Akismet comment spam filter comes standard on every WordPress install.

If you’ve every had to sift through your email spam folder looking for a real message, you probably think you know how bad spam can be. But you haven’t seen spam until you’ve seen blog comment spam.

“Unspeakable” is the best adjective I can use to describe it. Having to sift through the spam in Akismet makes me think of a line from an old Weird Al Yankovic tune — “I’d rather clean all the bathrooms in Grand Central Station with my tongue” (keep in mind, this was back in the early 80s, before GCS was cleaned up).

Most of the spam in my Akismet filter is not safe for work, and I won’t reproduce it here, but here’s a rather mild example to explain why blogs get spammed:

comments-spam-example.jpg

Do you remember the days before Google, when you would search for something on AltaVista or Excite and find pages that were filled with the keywords you search for? Google dealt a mighty blow to this kind of keyword search spam by figuring out a way to rank sites that didn’t depend on keyword density.

But it did not destroy the practice. Rather than use it on their own sites, spammers discovered they could actually do it on other people’s sites.

How? By leaving the spam in a comment.

Looking at the example above, you probably wonder what good that would do the spammer — what reader of Publishing 2.0 would ever click on those links?

But the spam isn’t there for you — it’s there for search engines, which tend to trust content on Publishing 2.0 — including content in the comments.

So if I allowed the comment above on one of my posts, it might cause that post to rank for one of those keywords. The person searching for “sex DVD” or whatever would find my post, search for the text on the page, and click on one of the links — which would be relevant, because that’s what they were searching for.

At least that’s the theory. The comment above is a pretty brutish example. But some are more difficult to catch.

Here’s one that got past Akismet and that I accidentally let through in my rush to moderate a pile of comments.

comment-spam-example-_2.jpg

Click on the image above and you’ll discover a unique form of spam on the web — a “blog” that exist for only one purpose — deliver ads.  The blog does nothing but link to other blogs.

With no content of its own, how would anyone discover it? One way is by generating comments to the blogs it links to in the form of trackbacks.

This is one of the reason why most blogging software automatically puts a rel=”nofollow” attribute on links in comments — so that comment spam links don’t influence search rankings.

That this spam comment was able to slip past Akismet, while Howard Owen’s comments got caught, is an example of a larger trend on the web — how spam threatens to squeeze out real content. For example, on Publishing 2.0, there are 8,822 real comments. Akismet has caught 362,719 spam comments.

But that’s a post for another day.

Comments (14 Responses so far)

  1. [...] Thankfully, Scott Karp, among others, knows I’m not a spammer. But he has had to hassle four or five times recently to fish my comments out of Askimet’s spam bucket.  That led to this post. [...]

  2. The Word Verify plug-in for WordPress is your friend. It adds a text captcha to your comments form. I don’t even have Askismet turned on. The text captcha stops the spam without the annoying squiggly lined impossible for any human (let alone bot) to read image captchas that some blogs use. Ans since it is just a text form the browser can remember it so your regular commenters don’t even notice except for the first time.

  3. Scott…

    Please don’t go the Captcha route.

    Add the “Bad Behavior” plugin instead. E-mail me and I’ll send you the line.

    Bad Behavior knocks out nearly all of the automated botspam, leaving just a trickle for Akismet to deal with. With BB running, you’ll not have to wade through ANYTHING to find Howard’s comments. Seriously, your Akismet spam will be a fraction of a percent of what it is now.

  4. I’ll second Bad Behavior. I installed it, as well a reCAPTCHA, after my blog was knocked out by a massive attack of comment spam (more than 20K an hour at the height of it) and it’s done an amazing job of picking the spammers off. Only a few get through and Akismet gets those and, unfortunately, Howard’s comments, but I no longer have to wade through pages and pages of comment spam to find the occasional false negative.

  5. I wonder whether this comment will appear, coz me and Howard Ovens are sailing on the same boat. Akismet consider me as a spammer and has blocked and unblocked me couple of times. Their support is just too bad which doesnt provide the reason for the blockage of spam, so we are just left guessing that this could be the possible cause, but finally the real reason is unknown. They asked me to comment on the site called http://www.podz.wordpress.com . guess what, the comment count in the url goes in 1000s but only 14 comment on the post. This itself shows how bad akismet can go.
    Anyways, I was left with no choice other than posting it in my blog also . I wish they do some change in their algorithm.

  6. That happened to me. For some reason, Akismet starting blocking my comments on WordPress blogs a few months ago. The solution was fairly simple — I went to Akismet.com and asked them to unblock me. Haven’t had a problem since then, but if it happens again, at least it’s fairly easy to get unblocked.

  7. This is a test. Will posting as howard-owens.com (a domain I own and redirect) let me post, or will askimet still think I’m an evil spammer.

  8. So far, I’ve been moderating comments (I run ExpressionEngine and I don’t believe Akismet exists for that platform –will check in a moment, though) which means reading some pretty silly stuff every day.

    Luckily, I have yet to see the real filth, but it strikes me that most spamming attempts come from what used to be called the Soviet-Union….

  9. Hi Erik…

    There is an ExpressionEngine module:

    http://loweblog.com/archive/2006/11/14/akismet-for-expression-engine/

    It has been running for several months, and I don’t see any negative feedback in the comments.

    Good luck.

  10. I hate spam as much as you guys – it sucks. Ike, I hate Captchas too…they are super-annoying. I think most would agree. But, I would disagree on Bad Behavior, I tried it for a while and its unreliable…I got locked out of my own blog. I’ve read that it blocks out some search engine spiders because it’s not very well written. I just recently got a plugin that is amazing at stopping spambots – WP-SpamFree.

  11. Please don’t go the Captcha route!!!!

  12. I’ve recently installed Akismet for EE but so far it has not caught a single spam, except for the test ones I put through myself with obvious words like Viagra in them. The thing it falls down on is all the new vanity spam stuff where someone replies

    “hey, great post yes I agree with you about this” then leaves a URL. I noticed these get through on Akismet’s website.

    Now on my website 99% of these come from eastern block countries, columbia or asia. So for a start I would like to be able to dump all comments from these countries for manual review.

    Akismet only seems to be as good as its database, and even after I marked a whole load of stuff as spam the same users were still being approved. Very frustrating.

  13. Yes, I agree blog spam should be stopped and new filters should be programmed, very have to stay on top of things its the only way efficient websites will go through. Cheers.

  14. nice post Scott Karp

Add Your Comment

Subscribe

Receive new posts by email