April 10, 2008

Many of my readers know by this point that I have an aversion to spam.  Add to that a strange sense of humor, and a little bit of sadness at the way the internet has been colonized by corporations – even to the point that companies we all pay for access to the internet want to charge extra fees for using particular sites, see Comcast – and we have Alexander the Great.

I’ve been thinking about moving my blog from Word Press to a self hosted ASP.NET server.  There are several reasons to do this, but many of them come down to the fact that I pay the mortgage with my job as a .NET programmer.  Version 3.5 is out in production, and I haven’t had much hands on experience with LINQ yet.  My own blog system will take some work, but it will also give me a chance to experiment with a lot of new concepts and make myself more valuable in the market.  But I digress, the real problem, in my mind anyway, is to create a spam filter that’s anywhere near as good as Akismet, which comes built in to WP.  I publish any comment that makes it past the spam filter (I believe in free speech), so I need my own,k and it has to be good.

I’m thinking the easiest way to do this is a rules based approach.  I can have a list of prohibited words, and give each of them a point value, then add up all the points for violations.  I’ve been thinking about which words to ban?

That’s what I’ve got so far.  This isn’t a terribly well planned post, really more of a brainstorm.  I would love to invite readers to share their thoughts on the matter, and, ultimately, I plan to open source what I come up with.  I firmly believe that ASP.NET is a better platform in most ways than PHP, but, sadly, there’s far less open code available for it.  A problem to be solved!

Here are some links to different spam research, in case anybody else is interested in tackling the problem.  Even if it’s already been done before, I think this is a valuable learning experience.  Like a muscle, the brain works best when it works often.


