Is anyone else tired of being caught in the endless loop of captchas? It can be very hard to read the strange text, especially when it’s full of random letters that look like numbers and vice versa. I really hate it when the letters are so squashed together that you can’t read them. There has to be a better solution!

What is a captcha?

You’ve seen captchas in your internet travels. They are those tests to see if you are a human. Captchas can take the form of a simple math problem or swirly text.

How do you feel about those captcha tests? They make me feel like a site doesn’t trust me. They get between me and my goal, slow me down, and annoy me. I get a feeling that I’m not alone in this.

What is a spambot?

A spambot is a piece of software written with the specific purpose of filling out forms with information that benefits the spambot author. This usually takes the form of comments that contain links which might help their website’s SEO (Search Engine Optimization). A clever author can extract the math problem from your form and calculate an answer. A very clever spam author can also get around image captchas.

So, how do you stop spambots?

The ultimate way to stop spambots is to use the Akismet plugin. Unfortunately, this service now costs money. I didn’t see any mention on their site of price cuts for non-profits. For a small personal blog or a small non-profit, this is cost prohibitive.

It’s time for a creative programming solution! To stop a spambot you have to think like a programmer writing a spambot. The simplest of spambots see a form and fill in every field on the form. So, what’s the solution?

What is a honeypot?

A honeypot is a field added to the form that the users can’t see due to CSS or JavaScript (which hides the field). Honeypots are awesome because they don’t inconvenience users like a captcha and they are a valid tool for thwarting spam bots. Basically, a spambot fills in a field that valid users can’t see, alerting us to their activity. If the honeypot field is filled in, we can confidently reject the form as spam.

After the honeypot was invented, the spambot authors got a little smarter. They added some code to detect these hidden fields. If the name of the field is always the same, then the field is really simple to detect.

Tricking Spambots with a Smarter Honeypot

It’s time to step up our game, programmers! Here’s a combination of spam thwarting techniques that makes a great spambot-proof form:

  1. Create a honeypot with the same name as one of the default fields. Make it look legit with a label. If you are using bootstrap, make it look perfectly legit with label and icon. We don’t want to alert the bot in any way that this field is special.
  2. Place the honeypot in the form in a random location. Keep moving it around between the valid fields. We don’t want the spambot writer to simply ignore the same field based on index.
  3. Rename your default fields to something random. Keep in mind you have to convert it back to its proper name on the server side. By naming the default fields to something random, the valid fields now begin to look like honeypots to the spambot.
  4. Add an expiration to your form. This will keep spambots from using the same fields and submitting the form later.
  5. Hide your form. You have to hide the honeypot to keep the valid users from filling it out. In my form, I hide the honeypot with JavaScript. It is still valid for you to hide this field with CSS. If you use CSS, your best bet is to use a class that contains a random word. In other words, if you call it “hide”, then the spambot author will pick it out easily.

captcha honeypot

Testing Honeypot Theories

I wrote a WordPress plugin that uses these techniques to test all the above concepts. The result? Spambots fill out the honey pot every time! This is great because the comments are now automatically marked as spam and it saves me time having to click the spam button every day.

I think the first bullet point using the same default fields in the honeypot is key because the WordPress spambots are definitely going to be looking for the 4 common fields that are on every WordPress comment form. This is a common footprint used by comment spammers and spambots. I guess the good news is that you know your SEO is starting to work when the spambots find the forms on your site.

If you would like a copy of my beta WordPress plugin, let me know. I don’t plan to publicly release it because I’d like to keep it out of the hands of those that would plan to attack it.

Do you have any additional techniques or ideas on how to get around the above techniques? How would you use a honeypot to thwart spambots? Let me know in the comments.

Update (February 05, 2014): This smarter honeypot is available for Django. Ben Timby originally authored this and I recently had to make a quick code change. Check django-secureform out on github.

Update (March 06, 2014): Let’s test another theory. Will the public release of the WordPress plugin reduce its effectiveness? Check out wp-smart-honeypot on GitHub.

Update (September 29, 2014): There is now a fork of wp-smart-honeypot called tarpit: https://github.com/cferdinandi/tarpit

Image Credit: http://www.jongales.com/

Need an API-based file storage system?

If your IT department is looking for a secure file management platform, consider signing up for SmartFile. If you’re a developer interested in hosting, check out our developer services below!

SmartFile Developer Amazon S3 Rackspace
API Endpoints 38 5 14
Free Tier Storage 100 GB 5 GB None
Free Tier Transfer 200 GB 15 GB None
Free Tier API Requests Unlimited 20,000 None
Web Interface Yes No Yes
Live Documentation Yes No No
File Size Limit None 5 TB 5 GB
Inherit FTP Access Yes No No

 

  Store & Retrieve Data
  Host & Serve Media
  Distribute Website Content
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInShare on RedditBuffer this pagePrint this pageEmail this to someone

Related Posts

Related Topics & Tags: Industry Thoughts Quick Tips

About Ryan Johnston

As an Interactive Web Engineer at SmartFile, I mostly work with JavaScript and a little Python. Since we "eat our own dog food" here, I have quite a bit of experience using the SmartFile API that powers the User Interface in the SmartFile Web Application.

33 thoughts on “Captchas Don’t Work: How to Trick Spambots with a Smarter Honeypot”

  1. I’m interested in seeing your plugin. I want to lockdown WordPress registration and could use some pointers other than the honeypot.

  2. Hi Ryan, I’d be interested in seeing your plugin! I’m currently working on a campaign site that prompts the user for a phone number, then sends content via SMS. Since it costs about 7 cents to sent an SMS, I need to find a way to effectively block bots. Thanks!

  3. Hi Ryan, I am trying to investigate feasibility of using a game-playing captcha alternative like PlayThru from AYAH. It would be great if you can share any pros/cons of using this tool. Any info would be welcome. Thanks.

  4. Aseem,
    I couldn’t find AYAH in a quick Google search. I did try PlayThru based on the demo on their website.

    Pros:
    * Game based so I didn’t have to figure out letters that were obscured to nearly human unreadable.
    * Tested the game on a Kindergartener. He was able to successfully complete the games. He actually liked playing the games.

    Cons:
    * Took additional time to load.
    * The items were moving and were hard to catch.
    * I was using a laptop touch-pad which is more difficult to drag and drop.
    * Touchscreen devices are becoming more and more common. Didn’t test this but assuming there will be issues with touch devices related to drag and drop.
    * I noticed a feedback link after I completed a game. Could this drive traffic away from my site? It opens in a new tab, but still this could be a concern. Maybe they provide an option to turn this off?
    * Required me to stop read directions and think. I mentioned this in the blog post. My main problem with a captcha is a slow down and inconvenience. If I can solve things with hidden code such as a clever honey pot, why would I ever inconvenience my users with a captcha (game or words)?

  5. If you are simply hiding the fields using javascript or css, why a bot couldn’t simply test the css parameters of the fields and avoid the fields hidden??

  6. Hudson,
    I didn’t hide it with css because I agree it would be easy to detect a class. A bot could look for a class including hide, hp, or honeypot. I’m hiding it with dynamically gererated JavaScript. This is also raw JavaScript rather than jQuery. The tricky part is that I’m using one of the standard field names for the honeypot and random names for the other fields. The honey pot as described in the post basically uses all the anti-spam tricks we could find.

    1. I’d throw absolute positioning on it toss it hidden of the view, give it index of -1 (or 10000 – you don’t want keyboard savy users tabbing accidentally to the field and filling it out- although makes it easier to detect [from a bot perspective]) – but if you use an index between, throw some onfocus javascript to go to the next field I suppose)… do you think that’d be browser safe? Who disables javascript these days? I suppose I have tended to do more server side protection vs. client protection…

      As someone who’s written many “automated scripts” (not for harm really) even for complicated places such as GS1 (they offer no nice API for UPC code assignment [or did not at the time]), Id’ say the best way to prevent spam bots is to (in addition to using a module) roll your own strategy from the ground up instead of solely relying on a module or whatever framework you’re using. Sure, someone who is targeting your website will be able to figure it out- but the majority of attacks are automated from existing code bases made to test/break the most common vulnerabilities (firing off from wherever-country– seriously just take a look at your auth.log and do some traceroutes).

      If someone really wants to target just your site there is little you can do to stop them (I mean, one could hire a click farm in south east asia or use amazon turk to have a human spam non-robot) – but to keep out the script kiddies- most likely limiting requests by IP would suffice. I suppose it really matters what kind of spam they are doing– targets or crawling your entire site to spam all over the place.

      Either way- my two cents….

      I’ve found it best when doing security checks to let the user/spammer/bot feel as if they are succeeding while you’re (behind the scenes) flagging their IP and not really letting their actions go through. This encourages them to continue attempting (perceiving success) and easier to track their trends / attack vectors / etc. In other words- (assuming storage space is not an issue) let them create their comment with the spam, just flag it as spam in the db or whatever, and have them hidden (or if you want to get tricky- only show them to the same IP that did the spamming). I suppose it depends what kind of spam you’re trying to prevent and what kind of spam someone would throw at your form.

      I’ve found particular success using this method to track scammers using stolen credit cards/identities on ecommerce sites. Giving them a misc error of some sort (and telling them to call customer service)- if the same user tries many several cards with different names in a row, you can flag the cards, shipping addresses, the ship name, username, the ip, etc… then you can track the same scammers across IPs/user accounts. This tracking/flagging does not need to be done live- especially if you plan to retroactively look for all other suspicious flags associated with any of the above aforementioned values. But I’ve gotten off topic.

      Great article, good tactics suggested- well written in a clear manner to appeal to technical and non-technical audience. Does smartfile use a honey pot? Does this form use a honey pot?

      -JOn

      1. All good tips Jon. SmartFile does use a honeypot on our signup form. I’m not sure what our WordPress uses. It is managed by another department outside of development.

  7. What about browsers that auto-fill common fields, like first_name, email, etc? If your honeypot field takes on one of those names, might it be an issue for some valid users?

  8. Jason,

    The honey pot is removed from the page via JavaScript when the page loads. Autofilling by the browser will not cause problems. If the browser does fill it, it will be removed from the page anyways.

    The valid field names are named random names. Given the random names, there may be a problem with those fields not autofilling when a user would expect them to autofill. So, it is rather the reverse of the problem you described.

  9. Does the javascript remove the honey pot, or just hide it? If the latter and the browser somehow filled it in, there would be a problem. Also, this solution won’t work for anyone with javascript disabled, but that’s exceedingly rare these days, so it’s probably not a huge issue.

    And actually, now that I think about it, if you’re going to require javascript, why not simply render the entire form using it? This way, the bots wouldn’t even see a form at all, which could potentially prevent that url from becoming a target in the first place.

    Another point—all this assumes that the spambots can’t or don’t run javascript. But is that a realistic assumption these days?

    1. I’m not sure whether disabled JavaScript is really so rare, and I even think it might become more common again. It’s being overused to extreme degrees on the web these days, leading to a lot of annoyances that makes people (like me) want to have it turned off by default. Add to that them often failing on mobile devices, or just taking ages to execute, as well as the jumbled mess that are JS engines. I take the same stance towards JavaScript as Flash: okay when used in moderation and where it makes sense, but a web design that absolutely *requires* JavaScript to work for all content and functionality to be accessible, is a failure. If JS fails to run and the honey pot field isn’t hidden, that might require some additional consideration. If the entire form doesn’t show up, you’ve lost your visitor for good.

      Your last paragraph is also a good point. Most popular captcha solutions, reCaptcha and SolveMedia, already use JavaScript, and bots are very good at solving at least the former. There’s no technical reason why spambots coulnd’t evaluate JavaScript, and they usually don’t purely for performance reasons. If enough JS-based anti-spam methods were to appear, it’s likely that it becomes sensible for them to start evaluating JS and attack fewer sites in the same time span, but be more successful.

  10. Hi Ryan,

    I’d love a copy of your WordPress plugin! Have any legitimate comments been marked as spam using this technique?

    Thanks 🙂

  11. The plugin actually just marks them as spam. So an admin can look through the comments, just in case. I haven’t seen any ligitimate comments there.

  12. Hey Ryan, thanks for the idea. Nice concept, and much easier than those awful CAPTCHA images. Do you have any code examples? I don’t need anything “complete”, just basics so I can wrap my head around the logic. I’ll be using PHP.
    Thanks!

  13. Hi Ryan,
    I am trying to implement a honeypot for forms using cgi scripts – php forms, which post to cgi scripts based on Matt’ Script Archive’s formmail.pl. I am wondering how this implementation would be different. Do you have any ideas?
    Thanks!

  14. Yvonne,

    I once wrote a perl script. Then, I rewrote the same thing in PHP. The PHP script was half as long. That was the last time I wrote a perl script. This was probably about 2009, which is also about the last time “Matt’ Script Archive’s formmail.pl” was edited.

    The first thing I saw with this script is that it had a hidden input with the recipients. This is a big no no. Anybody could post to this script with their list of spam addresses. Aside from any security issues with this script, it just might work.

    It looks like you config a list of form fields. To use these techniques with it:
    1. You would pick a random number between 0 and the size of the array of fields.
    2. Rename all the fields and include some hidden field to decode the field names.
    3. Insert the honeypot into the form in the random position that you determined earlier. Give it one of the legit field names from the array of form fields.
    4. You can put an expiration on the form if it is a simple form. If it is more complex, like a comment on a blog, you may not be able to set an expiration because some people can take hours to craft a crafty comment.
    5. Remove or hide the honeypot with CSS or JavaScript (or both?)

    When the form posts:
    1. If the honeypot has been filled in report to the spam bot that the form has been submitted successfully. (You get a special pass to lie when you lie to spam bots.)
    2. If the honeypot is empty or non-existent, it passes. Proceed with normal sanitation and validation of the form.
    3. If the form passes sanitation and validation, then send your email or insert in database or whatever.

    Take a look at this github repo for an untested mock up of some code: https://github.com/freak3dot/smart-honeypot

    1. Hahaha…. 🙂 I once maintained a bunch of Perl scripts I didn’t write…oh, wait, I still do.

      Thank you very much for your thoughts. I’m going to work on this and see where I can get–I’ll come back and let you know how it went.
      Cheers!

  15. We recently implemented a hidden field Honey Pot on our website, how can we verify if it was properly installed?

  16. I usually test by taking out the code that hides it. You can do this in firebug or server-side; whatever is easier for you.
    I submit once with the honey pot filled in. I should see this one flagged as spam or ignored.
    I submit again with everything but the honey pot filled in. This one should follow the normal process.

  17. Thanks for these hints! Captchas really need to disappear. As automatic solvers got better, their creators thought they needed to make them ever more difficult, and now they’re at a point where I’m pretty sure the spam bots have a better success rate than me. It rarely takes me fewer than 2 or 3 tries to successfully decipher one these days, while from trying out one captcha solver myself, I can confirm reports that they easily achieve a hit rate of 70%. Captchas officially fail at their task of telling humans from programs.

  18. im trying to get spam bot to check, I just want to study different validation which works which are not, I know there’s no fool proofing spam bots and they get really smart now, but just want to study which is more effective over the other 🙂 maybe also I can bar those spammers in the future

    http://www.rvaleriano.com/fight-spam/

  19. gin,

    I don’t think it matters which is more effective. We never want to inconvenience our users. Forms on websites could be for signup, cart processing, contact, request a quote, and more. Any of these actions are valid conversions that we want our users to complete. If we put a hurdle in their way, the conversion rate is likely to go down.

    Ryan Johnston

  20. Hi!
    Really interesting approach. Loved to read.
    I’m putting up a website using WordPress but still at localhost. I’m willing to give it a try to this plugin, so I installed, but I am not able to test it until I put the website online.
    Can you update me if there is anything else I need to do besides the plugin installation?
    And do you know if Akismet works the same way? Can I have both plugins up and running?
    Really enjoyed to find out about this concept and finding it for free here in SmartFile. Thanks.

  21. Hey rodrigo,

    Thanks for stopping by and leaving a comment.

    The only other thing you should do after install is check your comment page for spam comments. I haven’t seen any false positives yet, but it is worth checking. Empty the spam regualarly as it may impact the speed of your wordpress site if you have thousands of spam comments in the database.

    I’m not sure anyone has tested this with Akismet. I stopped using Akismet when they started to charge. The way I understand Akismet is they run comments through some spam filters based on the content of the comment. In theory, running them both should work without conflict. If you try it, let us know how it works out.

    Ryan

    1. I sure will let you know, Ryan. Thanks for clarifying that.
      Just one more thing: Does this wordpress plugin also applies automatically to other forms or only the comment form? If it can, do I need to do anything to extend it to other forms?
      I can not tell if it is already there, but I have a Visual Form Builder plugin installed and was checking fora hidden field in it. Is this from the Honeypot?

      if not? How can I apply it to this other form also? VFB has a simple text captcha (math), and I’m doubtful that it will be enough.

      1. The wordpress plugin only works on the comment form. You’ll have to write PHP to extend it to other form and/or VFB. An easier solution might be to ask the VFB author to implement a honeypot and point him or her here.

  22. I think it’s a good idea, you could make it impossible to tell which form element is the honey pot, but that doesn’t eliminate the threat entirely. Firstly, if you just used simple css classes with display:none or visibility:hidden then it could be easily detected. One way to thwart that could be instead of hiding the input, you cover it up with another div or element so the input itself isn’t actually showing any properties of invisibility, rather it’s being covered up. I’m sure there is a way to generate an image of the web page and pin point which elements are hiding, but that would be a very determined programmer/hacker to come up with that.

Leave a Reply

Your email address will not be published. Required fields are marked *