Many of my readers know by this point that I have an aversion to spam. Add to that a strange sense of humor, and a little bit of sadness at the way the internet has been colonized by corporations – even to the point that companies we all pay for access to the internet want to charge extra fees for using particular sites, see Comcast – and we have Alexander the Great.
I’ve been thinking about moving my blog from Word Press to a self hosted ASP.NET server. There are several reasons to do this, but many of them come down to the fact that I pay the mortgage with my job as a .NET programmer. Version 3.5 is out in production, and I haven’t had much hands on experience with LINQ yet. My own blog system will take some work, but it will also give me a chance to experiment with a lot of new concepts and make myself more valuable in the market. But I digress, the real problem, in my mind anyway, is to create a spam filter that’s anywhere near as good as Akismet, which comes built in to WP. I publish any comment that makes it past the spam filter (I believe in free speech), so I need my own,k and it has to be good.
I’m thinking the easiest way to do this is a rules based approach. I can have a list of prohibited words, and give each of them a point value, then add up all the points for violations. I’ve been thinking about which words to ban?
- p0rn (with a zero)
- pen1s (with a one)
- click (apparently this kills 80 % of spam and only catches 1.2 % of non spam)
- porn, etc, spelled correctly with letters
- offer or offers
That’s what I’ve got so far. This isn’t a terribly well planned post, really more of a brainstorm. I would love to invite readers to share their thoughts on the matter, and, ultimately, I plan to open source what I come up with. I firmly believe that ASP.NET is a better platform in most ways than PHP, but, sadly, there’s far less open code available for it. A problem to be solved!
Here are some links to different spam research, in case anybody else is interested in tackling the problem. Even if it’s already been done before, I think this is a valuable learning experience. Like a muscle, the brain works best when it works often.