Pastebin fights the spam!
A few people have emailed me recently disappointed by the level of spam postings on pastebin.com. I’ve never really understood why spammers bother, but as they are bothering in increasing numbers it was time to take some action.
Last night I built in some spam filtering which has caught hundreds of posts since going live. I also added a “report spam” link which has flagged over 500 posts in past 20 hours. By iteratively tweaking the spam filter to identify the legimately flagged posts, I’ve been able to quickly delete a lot of older spam posts.
Hopefully this will make pastebin look like a well tended garden rather than a run-down wasteland! Comments welcome…

1Slepp
wrote on 22 August 2007 at 1:03
Though we all love the spam… Oh come on.. you know you do.. :>
Anyhow, Paul, check out http://www.projecthoneypot.org.. It is quite effective, and you can redistribute it with your code as well (though I think an end user has to do a bit to get it going again, like getting their own account). It could compliment the new methods you implemented yourself.
2HD
wrote on 22 August 2007 at 18:18
“SPAM SPAM SPAM SPAM…..” – Oh the wonderful Monty Python tune!
Well done that man!
3Anonymous
wrote on 27 August 2007 at 6:45
Easy way to kill most spam on the spot:
- rename the name/link/etc. fields to something weird
- put this in your form:
(Leave these fields blank!)
- put this in the site stylesheet:
div.trap { display: none; visibility: hidden }
Then when processing the form, if there’s anything in the ‘name’ or ‘link’ fields, drop the post. This method alone has cut my spam by about 90-95%. I have no captcha, and no wonky heuristics that are eventually bound to flag legitimate data as spam at some point or another. Just a couple of innocent-looking fields that spambots fill in because they look important.
4Anonymous
wrote on 27 August 2007 at 6:48
… oh, ffs, it stripped out the html. pretend these parens are angle brackets.
(div class=”trap”)
(Leave these fields blank!)
5Anonymous
wrote on 27 August 2007 at 6:49
Goddamnit.
I give up.
6lordelph
wrote on 27 August 2007 at 20:39
Hah, it’s OK, I understood. I should try it, though I fear it would only work for a short period
7Slepp
wrote on 29 August 2007 at 17:32
In response to lordelph & Anonymous, it actually works for a very long period.. Still working for me. It even works with (input type=”hidden”) for some of the absolutely stone stupid bots.
8lordelph
wrote on 30 August 2007 at 8:05
Thanks for all the comments, I’ve continued to tweak it and the amount of spam (and spam reports) has fallen dramatically. Will keep an eye on it!
9Joel "Jaykul" Bennett
wrote on 31 August 2007 at 18:37
Ok, so please forgive the misplacement of this comment, but I can’t find anywhere else to put this. Anyway, I understand PasteBin is GeSHi based, so I thought I’d contribute a PowerShell syntax …
http://huddledmasses.org/jaykul/powershell-highlighting-for-geshi/
10Ozh
wrote on 2 September 2007 at 13:50
Implement Akismet ? *Could* be buggy for pasting code, but could work as well
11msg
wrote on 5 September 2007 at 16:54
How about picture? Maybe u should place here some kind of engine generating images with words or totally random letter to re-type? 90% would gone (spam-bots)
12Rick
wrote on 5 September 2007 at 17:58
msg: I’m strongly against pictures, I really dislike filling in those things and it would atleast drive me away from this site. The hidden form fields works with most bots and if that’s not enough than a bit of javascript (perhaps in combination with ajax) is sufficient to kill most spambots.
13Noccy
wrote on 5 September 2007 at 19:02
Off-topic, but a request indeed. How about being able to add something to the command line to highlight specific lines? like http://pastebin.com/abcacbacb@5,16-32,79-150 or so (to highlight lines 5, 16-32, and 79-150
). Would be awesome
14lordelph
wrote on 5 September 2007 at 19:08
@msg: The anti-spam measures are working very well, but I’ve had one report of a legimate post getting flagged as spam, so I will either relax things a little or add a CAPTCHA only for those posts with a spam smell them!
@Noccy: interesting, but would anyone really use it? I’ve got an idea for making the existing line highlighting features easier to use though…
15Vinyanov
wrote on 8 September 2007 at 19:53
(Not reading the previous comments, sorry, its just a quick note). Few moments ago I boldly clicked on a Spam report link and seemingly reported a valid code. Let me explain:
You know, as a webmaster I have a slightly daring attitude when speaking of web forms. I often click on things just to see if they offer any confirmation or how deal with invalid input. And, to my regret, your markup has not offered me any confirmation prompt, so that I could verify my decision.
Could you possibly add an … for your anti-spam links? Hope that helps the site.
16Vinyanov
wrote on 8 September 2007 at 19:55
Oops, an overactive code parser. Lets try again: Could you possibly add an a onclick=confirm(”Sure?”) href=http:// … for your anti-spam links? Hope that helps the site.
17lordelph
wrote on 9 September 2007 at 19:52
The “report spam” link is immediate by design, to encourage its use. False positives are relatively rare and are ignored.
18Selig
wrote on 10 September 2007 at 0:04
I agree with the hideen fields system, Other people have used it in their comment system, and it got rid of most of their spam. The spam bots dont think to check the CSS or sometimes even the field type. I strongly sugest this over CAPACA, as Capatcha (sp!) can ittitate users more than the benifit of the spam bots being detered.
19Bigbossbunny
wrote on 12 September 2007 at 16:57
BOO to spammers… great job
20MrLight
wrote on 14 September 2007 at 1:00
To stop spam on a site. Just post a banner saying no spam allowed.
21lordelph
wrote on 15 September 2007 at 9:00
MrLight, if only it were that easy!
22Gargantua
wrote on 23 September 2007 at 14:27
I don’t know If I’m the only person saying this, but what exactly DEFINES spam? it could be people just posting things to transfer them elsewhere…
23Anonymous
wrote on 24 September 2007 at 11:21
Before I put the spam filter on, there were hundreds of posts which were just lists of links and keywords for typical spam enterprises, submitted multiple times.
If you didn’t see it before, trust me , it is pretty obvious when a post is spam!
For those posts flagged with the “Report spam” feature, I read them and think “could someone conceivably want to send this to another individual for comment or review?”. If I see a new pattern emerging, I tweak the automated filter appropriately.
I’ve only had one report of false positive so far, but happy to hear of such incidents…
24sysprv
wrote on 30 September 2007 at 18:52
Hey hey hey
at least for private domains… The really simple (and therefore scriptable) page structure (form) is…delicious.
I use pastebin with cURL scripts… To upload settings etc. from different computers. Please don’t restrict it too much
By the way, is there a way to retrieve posts that have fallen off the “Recent Posts” box?
Thank you for this service
25v2k
wrote on 9 October 2007 at 22:17
props to pastbin.
26Nicolas
wrote on 13 October 2007 at 20:21
Report spam link is totally wrong for a simple reason: it’s a link. A GET request shouldn’t cause an action. If the request has a side effect, use POST (so same goes for the delete link). I can easily see a crawling bot marking all posts as spam.
Hmm actually no… Because it’s even worse: Why The Hell Javascript? What good reason do you have to block people from reporting spam when using lynx from a headless machine?
27lordelph
wrote on 14 October 2007 at 9:03
Totally wrong eh? Sorry about that. Thanks for the feedback though!
28Nicolas
wrote on 15 October 2007 at 5:33
http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Safe_methods
29lordelph
wrote on 15 October 2007 at 9:56
Thanks again. But I really did want it to work the way it does. Sorry it annoys you so much though.
30kato
wrote on 23 November 2007 at 14:02
When will the anti-spam markup be available for download?
31kato
wrote on 27 November 2007 at 19:30
any chance I could get a download of the new code so I don’t have to write it myself? My pastebin is overrun with spam : (
32Cesar Rodas
wrote on 24 December 2007 at 15:50
Why don’t put an CAPTCHA filter? That will stop most of spammers…
33Internet Expert
wrote on 26 December 2007 at 18:16
Everybody hates CAPTCHAS. I think an agglomeration of inconspicuous methods (to the normal user) would be best.
For instance, setting a cookie and ensuring that the user sends it back will ensure that at least a somewhat functional browser is being used. (Not many bots will use cookies). If no cookie is sent back along with the paste form, you can display a kind red message to the user to enable cookies. (Cookies are necessary on an ever-increasingly dynamic Web!)
Not forgetting the hidden input methods too, and slightly medieval methods such as blacklisting on multiple detections and heuristic confirmations.
As pastebin software becomes more popular, more spammers will tailor their bots to specifically target it and bypass your specific anti-spam techniques. This is where CAPTCHA must simply come in, until better methods of differentiating a human brain and electronic processor are developed.
34Michael Scherer
wrote on 12 January 2008 at 19:06
I think the spammer send link in order to increase their google rank ?
35anon
wrote on 12 March 2008 at 22:31
I cant post my logs any more, always flagged.
36lordelph
wrote on 12 March 2008 at 23:15
Can you post a few sample lines here, and I’ll tweak the spam detection…
37Eero
wrote on 13 March 2008 at 6:38
“I cant post my logs any more, always flagged.”
Same here, I cant use my pastebin anymore. It says only: “Sorry, your post tripped our spam filter – let us know if you think this could be improved”
38Eero
wrote on 14 March 2008 at 6:26
Thanks, now its normal again.
39Johnny
wrote on 28 March 2008 at 0:03
Have a check box that is “check this box if this is a spam post”. In the code make the checkbox look important and as though it must be clicked to post and bots will check that field and get flagged (or even better, banned).
40lordelph
wrote on 28 March 2008 at 0:47
Something like that already occurs. Most of the “spam” filtering is more about filtering posts I don’t want to be hosting.
41Seal
wrote on 11 April 2008 at 9:21
I’ve found that when you put your comments on a separate page it seems to dramatically reduce the amount of comment spam, maybe because the comment page has a lower or zero page rank. It takes a bit away from the whole ‘flow’ of the blog but you gotta weigh up the pros and cons
42Adam Higerd
wrote on 11 April 2008 at 16:16
Someone mentioned a honeypot earlier, as well as the problems with making the “report spam” and “delete” links GET requests instead of POST requests.
A simple solution is to make a “honeypot” anchor link in your invisible DIV. If the honeypot link gets followed by a “user” (of course, the text of the link should indicate that you shouldn’t click it, but this would only be visible to non-CSS browsers) then that’s an indication that you’re looking at a crawler that should be temporarily banned/ignored.
Meanwhile, to make sure that search engines work, make sure that the honeypot script (as well as the scripts that manage marking spam and deleting posts!) is listed in your robots.txt so that well-behaved crawlers know to ignore it.
43Ben
wrote on 14 May 2008 at 19:37
Yay I just spammed – Try and stop me now ! Muwhahaha
44Xrvel
wrote on 4 July 2008 at 4:50
I like your pastebin. I use it often, but i hate the spam. Why don’t you use captcha? It’s easy to implement.
45Russ
wrote on 9 July 2008 at 18:59
I like the pastebin the way it is. Mostly I deal with people asking for help in IRC while I’m at work. If I have a minute I help people out.
I wouldn’t take the time to deal with a captcha if it was implemented. If anything I would perhaps change the code so the captcha appears iff the first spam method triggers.
46Ralf
wrote on 25 August 2008 at 0:21
Hi there,
am I missing anything? I just downloaded and installed pastebin from http://pastebin.com/pastebin.tar.gz just to realize that is hardly looks different to my older version.
I can’t see any means of spam detection or how spam could be avoided in pastebin-0.60.
Which version of pastebin are you talking about and if it’s not 0.60, where can I get the source?
Regards
Ralf
47lordelph
wrote on 2 September 2008 at 17:58
I haven’t packaged the latest release, will try rectify that in the next few days….
48run088
wrote on 3 September 2008 at 22:06
I keep getting tripped as spam but my post is not spam.Whats the problem?
49Jack
wrote on 19 September 2008 at 22:55
it would be great if you can upgrade the version lordelph
50Axel Werner
wrote on 18 February 2009 at 10:23
I dont know why.. but your crapy SPam filter realy is anoying sometimes.. i wanted to post contents of a linux file and some console output and your darn spam filter denied my post for beeing SPAM.
51lordelph
wrote on 24 February 2009 at 4:14
I can only improve it you send me a sample of the ham. If it looks anything like a list of email addresses that is the most likely thing to give it a high score.
52Robert Spencer
wrote on 5 March 2009 at 10:47
My boss asked me to paste the headers of a SPAM mail so we could nail them together, but I can’t. I see that for some odd reason your filtering on email addresses, that would mean that I can’t post header or server logs.
Is there any way to get around that without having to change my post, I really need it to be exactly the same as I received it. Please.
53Konstantinos Togias
wrote on 2 April 2009 at 20:34
Hi, I am using pastebin 0.60 on a server of mine, and recently had problems with spam too. Is there an updated packaged version with spam control, yet? The link to download from pastebin.com still gives 0.60 .
54spirit
wrote on 2 July 2009 at 15:57
Would it be possible to have the new package with the spam fighting rules please?