Alexander The Great

June 12, 2008

Is Spam the Dominant Species?

Filed under: Evolution,Science,Software — alexanderthegreatest @ 12:09 am
Tags: , , ,

A friend told me we should round up all the spammers, and throw them off the Golden Gate Bridge, down to the sharks below. Trouble is, others will take their place. Spammers, sadly, aren’t a hereditary breed – it’s a learned behavior. (Almost Lamarckian!)

Even if you don’t agree about where spammers come from, we’ll have to agree there are too many of them. Spam is a very successful meme, a unit of cultural information that’s better than most at copying itself. In the realm of intellectual selection, spam is to be found far and wide in the meme pool. Spam is maybe a parasite working on (or against?) the get rich quick meme – if people stopped wanting a quick and easy buck, spam would vanish overnight.

What’s this rubbish about it being the “dominant species” though? We eradicated small pox, a more difficult and more important thing than going to the moon, and we’re losing the war against spam. It’s beating us. America gave fire water to the “Indians” to take their land – now, in some places, native casinos are using greed to take modern culture’s money. Spam is doing much the same thing.

Spam is a concept, an idea, that by producing a lot of useless drivel, a person can strike internet riches. It comes in a few varieties, from the email sitting in your box, selling you viagra and mortgages, to the affiliate and “search engine friendly” links in a forum and a blog. It’s PayPerPost, where a blogger can beat the 1849 gold rush by telling you how wonderful a sponge and a bank account are. It’s Digital Pointless, where you can buy other people’s Wikipedia and eBay accounts. Fine, that’s what spam is, but what are we? It’s hapless accomlices, we’re machines, some of us, that spam uses to copy itself.

All of this is Darwinian. If you have variation (spam, job, investment, invention), heredity (new spam is very much like old spam, but refined in its sales pitch or its delivery) and selection (spam filters, forum moderators, people seeing through it), you have evolution. This works in biology (genes), and it works in ideas (memes). If you have the struggle for existance among things that copy themselves, the one that’s better at making copies will come to dominate, to fill its world. Ladies and gentlemen, this is exactly what spam is doing – a digital thing filling its internet world. One of the dominant species in the meme pool.

Spam

This proves our point – bad spam, the least fit, failing in the struggle for existance.

April 15, 2008

Great C# Resource!

Filed under: Software — alexanderthegreatest @ 11:10 am
Tags: , ,

Microsoft has long operated like Reagan’s Evil Empire. I’m sure everybody reading this blog knows the story of MS-DOS, which isn’t all that unlike the story of Manhattan being sold for $32. Mr Gates (who stepped down from the helm years ago, but will remain the symbol of Microsoft’s market dominance for years to come) is an historic philanthropist but also a shrewd business man.

So it’s interesting to watch this turn around. Windows is facing intense pressure from all sides. Google is a media darling and suitor to the thrown, we’ve already seen gmail face off against Exchange Server, plus Word and Excel be surrounded by Google Docs and Spreadsheets, and the more formidable Open Office. Vista has become the new Windows ME. Rumors of a Google OS are frightening Microsoft, Linux looks just like Windows, making the transition from a $200 operating system to a free one all the easier, and Mac’s Boot Camp means all the more choice for the consumer. Choice has traditionally been an enemy of Windows – having computer users by the proverbial throat is one of the main reasons most people use Microsoft software.

Pride notwithstanding, the people at Microsoft would be fools not to recognize the situation they find themselves in.  IBM’s example makes the situation all the more relevant.  Microsoft has enjoyed a long reign at the top of their game, and it seems to be all downhill from here.

So they’re trying to “change”, to open up.  Microsoft has blogs galore – not just ones people can create on a subdomain (a la WordPress or Blogspot) but coming from their brass.  Microsoft publishes betas, or as they like to call them CTP community test preview releases.  Minimally stripped down versions of their almost flagship Visual Studio are available as free downloads, with free licenses, for anybody who wants one.  Are you a C++ programmer, ideally from the *nix world?  Come on over, the grass is green on the default XP wallpaper.  Set up shop, have a free compiler, and sell your wares.

This is a bright strategy.  At present, in the Spring of 2008, Windows honestly is the best platform for general use.  This is quickly changing, but the fact is that most of the software available for all systems is for Windows, and more to the point, most of the software a non savvy family or company would want is for Windows.  Why else would Microsoft give away much of a product (Visual Studio) that runs anywhere from $50 (old version with no frills) to several thousand?  Why else would they use betas, abundant white papers, MSDN?  They want people writing Windows software, keeping their operating system afloat.  The more people who write exclusively for Windows, the more reason other people have for running Windows.

And all of this brings us full circle back to the title of this post.  One of the people on the C# design team has a blog about using C#.  This includes how to code samples (MD5 a string, improve hash table perf against structs), explanations on why certain features do and don’t exist in the language, and so on.  All C# developers will benefit from reading this blog.

April 10, 2008

Spam Words

Filed under: Software,spam — alexanderthegreatest @ 8:24 am
Tags: , , , ,

Many of my readers know by this point that I have an aversion to spam.  Add to that a strange sense of humor, and a little bit of sadness at the way the internet has been colonized by corporations – even to the point that companies we all pay for access to the internet want to charge extra fees for using particular sites, see Comcast – and we have Alexander the Great.

I’ve been thinking about moving my blog from Word Press to a self hosted ASP.NET server.  There are several reasons to do this, but many of them come down to the fact that I pay the mortgage with my job as a .NET programmer.  Version 3.5 is out in production, and I haven’t had much hands on experience with LINQ yet.  My own blog system will take some work, but it will also give me a chance to experiment with a lot of new concepts and make myself more valuable in the market.  But I digress, the real problem, in my mind anyway, is to create a spam filter that’s anywhere near as good as Akismet, which comes built in to WP.  I publish any comment that makes it past the spam filter (I believe in free speech), so I need my own,k and it has to be good.

I’m thinking the easiest way to do this is a rules based approach.  I can have a list of prohibited words, and give each of them a point value, then add up all the points for violations.  I’ve been thinking about which words to ban?

That’s what I’ve got so far.  This isn’t a terribly well planned post, really more of a brainstorm.  I would love to invite readers to share their thoughts on the matter, and, ultimately, I plan to open source what I come up with.  I firmly believe that ASP.NET is a better platform in most ways than PHP, but, sadly, there’s far less open code available for it.  A problem to be solved!

Here are some links to different spam research, in case anybody else is interested in tackling the problem.  Even if it’s already been done before, I think this is a valuable learning experience.  Like a muscle, the brain works best when it works often.

April 8, 2008

Will Google Kill Microsoft?

Filed under: Modern Life,Software — alexanderthegreatest @ 11:20 am
Tags: , , ,

Will Microsoft? Will they even be killed? It seems like the Redmond giant is a lot like the Roman Empire. Too big for its briches, and lately on a downward slope. They’ve extended the shelf life several times now of XP, due to among other things a user petition signed by many thousands of Vista non users.

Google has been applying pressure for some time now. Gmail is aimed directly at Exchange Server, between the mail and calendar. It’s even available in whatever domain you might choose as part of Google Applications.

And, while speaking of their apps, Google Docs is now available offline.

The rumors of a Google Phone seem to have died off, but I’m still hearing about a Google Operating System.  This may or may not ever happen, but I’d bet it’s holding some people back from “up”grading to Vista?  Their desktop search application puts them on the desktop.  They have a significant amount of public good will, both from the quality of most of their products, and their protection of their customers against the Bush administration’s domestic spying programs.  (Although, in truth, there are questions about Google’s data retention vs privacy, and about their response to the US vs China’s governments.  Some people are even comparing them to Microsoft!)

March 14, 2008

Linux Looks Just Like Windows

Filed under: Software — alexanderthegreatest @ 9:30 pm
Tags: , , , ,

Here are some of the operating systems FireFox has been ported to. Xandros Linux has the most realistic version of the Windows taskbar, while Mandrake has that half transparent Start Menu like Vista.

Funny how htese different Linux distros saw fit to borrow the Windows GUI pieces that never found their way into Mac OS X. Instead of a global menu that’s shared (and heavily altered) between the operating system and running applications, a hideable taskbar is a wonderful – and clearly understandable – thing. Note the positions of the close a window button, and also minimize and maximize.

Windows Server 2003 (NT 5.2)

Firefox on Windows 2003 Server

Ubuntu Linux 5.10

Mandrake Linux

Firefox on Mandrak Linux

Xandros Linux

Firefox on Xandros Linux

March 13, 2008

SQL Danger

Filed under: Evolution,Programming,Software — alexanderthegreatest @ 3:53 pm
Tags: , , , , , ,

SQL Server supports a cooperative, non-preemptive threading model in which the threads voluntarily yield execution periodically, or when they are waiting on locks or I/O. The CLR supports a preemptive threading model. If user code running inside SQL Server can directly call the operating system threading primitives, then it does not integrate well into the SQL Server task scheduler and can degrade the scalability of the system. The CLR does not distinguish between virtual and physical memory, but SQL Server directly manages physical memory and is required to use physical memory within a configurable limit.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/denet9/html/d280d359-08f0-47b5-a07e-67dd2a58ad73.htm

If you’re scratching your head, thank your lucky stars you don’t need to understand this gibberish. I’ve been having to focus on SQL Server and .NET Integration, also known as SQL CLR integration. A lot of people have made fairly bad choices in the very, very recent past. Just because you can write your stored procedures now in Visual Basic doesn’t mean you should. Obviously it cuts both ways, and that doesn’t mean you shouldn’t – the trouble seems to come when people can’t decide whether they should or not.

Having to help guide that decision, I found the 1st paragraph above, which has some frightening guidance on the matter. It seems to suggest never to use .NET at first glance, but that isn’t really the case. What it really says is long running code that might interfere with SQL’s thread scheduling can be bad. Code that spawns new threads really shouldn’t be hosted in SQL. PInvoke calls can hurt.

But XML processing is an example of something that does none of those things. And while SQL has good XML support, for some operations, .NET is better. Also the set based nature of SQL compared to the procedural and object oriented nature of C# mean you have better control over caching in .NET, so by porting this type of sproc, you can parse an XML document in memory once, instead of once for every time you need to access it.

After all, it’s not SyBase in 1997!

March 7, 2008

Hungarian Notation

Filed under: Evolution,Programming,Software — alexanderthegreatest @ 3:44 pm
Tags: , , ,

Hungarian Notation is a naming convention whose main rule is that the prefix in a variable name should be longer than the name of the variable itself. For example, should you find yourself needing a string called Foo, you might call the variable that refers to it as gpnzstrFoo. You would know this because

  • g = global variable
  • p = pointer, because in C++ everything is a pointer
  • nz = null terminated (sometimes sz)
  • str = string value
  • Foo = the actual name of the variable

This is a mouthful and a lot to memorize. Further, it means that when you refactor your code and change the type (you might encapsulate string handling into a SuperString class as a schoolboy example) you have to either change the name of the variable which is happens in O(n) time, with n being the number of uses. Modern development environments offer refactoring services to safely accomplish this, but they also provide other services that make HN redundant and unnecessary.

Hungarian gives an illuminating view of the history of software development, and of Windows in particular. In the old days one would pull down the Windows “header file” (#include “windows.h”) and delve into its contents, using Microsoft as an example of successful large scale development efforts done right. In fact Microsoft is often credited with birthing Hungarian, in order to make scrolling work in Word and Excel. Today, like Windows 3.1, HN is relegated to the scrap heap of history.

Here is the part that might have fell web site producers call me a spammer – I’m going to copy and paste from a treatise on Hungarian Notation, some text which explains the fallbacks of this convention much better than I ever could. (Borrowing liberally from other sources, sometimes called “scraping”, indicates a lazy webmaster and is typically associated with made for adsense spam blogs. In this case, however, I’m simply trying to point my dear readers to a helpful resource, for those of you who are interested in this topic.)

My problem with Hungarian Notation is more fundamental and stylistic – I think it encourages sloppy, sprawling, poorly decomposed code and careless, ill-coordinated maintenance. Simply put, if your namespace is so polluted that you need a cheap trick like HN to keep track of your variables, you’ve made a terrible mistake somewhere. You should never have so many variables and constants visible at one time; if you do, then you need to review either your data structures, or your functions.

This is especially true for code written under either the OO or FP methodologies, as a primary goal in each is to isolate (in OO) or eliminate (in FP) those variables that are not immediately needed.

HN notation also presents a problem in the case of OO, in that it interacts very poorly with polymorphism. It is actually undesirable to know the class of an object in many instances, but at the same time marking it with the parent class tag is misleading and contradicts the goal of HN. As for marking globals as separate from locals, why on earth do you have any globals in the first place? 😉 — JayOsako

I agree that HungarianNotation is bad because it’s a crutch for overly large namespaces. That’s why I use single character variable names to force myself to write clear code. — EricUlevik

My Response: 26 variables? At once? How can you manage that?

Seriously, HungarianNotation is only a symptom of a larger problem. Too many programmers see the solution for excessive state to be more state; they end up with variables to track variables to track variables, all bound up with arcane rules for naming that are supposed to be indicative of their meaning (which it rarely is). The point isn’t so much that you have to limit the number of variables, but rather that large proliferations of variables in the local namespace is a sign of a design mistake. Why would they all be visible at once? Aren’t there some abstractions they can bound into? Why do you need this particular bit of state visible here, anyway?

The rule should be simple: if you don’t need to see it, you shouldn’t be able to. Period.

Like VisualTools, HungarianNotation is not SoftwareEngineering, its TheIllusionOfSoftwareEngineering. It’s an easy and comfortable fix that looks good because you can DoItNow? and you don’t have to do all the hard thinking that comes from trying to do the RightThing. The time you ‘save’ using it will be more than spent later trying to maintain the resulting morass. — JayOsako

This is very well put, and pretty much sums up my feelings on the matter. — KevlinHenney

Any interested programmers are encouraged to read the original page. This bit cracked me up, since we’ve mentioned the historical implications of the subject

What I think you really want is for your interactive development environment to give you a rollover (ToolTip?, for you Windows junkies, or minibuffer message for you Emacs junkies) that shows you the declaration of this variable at every use point when your cursor goes over it. — RusHeywood

Visual Slick Edit (Windows) does this.

VC6 has something approaching this. They copied it from VB – when you’re typing code to call a method, it pops up with a prompt window showing the names of the arguments. It’s not perfect, but it’s a step in the right direction. — RogerLipscombe

March 6, 2008

An Idiot Misunderstands SQL

Filed under: Modern Life,Programming,Science,Software — alexanderthegreatest @ 3:44 pm
Tags: , , , , ,

Since my Digital Point – Misunderstood post has proven such a hit, it occurs to me to share a hysterical post I found on SQL.NET. This idiot is writing “Death to TSQL” because of horrible SQL coding skills and because the man doesn’t seem to understand that SQL is not ASP. (Yes, you read that correctly – SQL and ASP Ancient.)

The article concludes that TSQL should never be used for stored procedures, that procs should never (“.EVER.”) use temp tables, and that the CLR (Common Language Runtime – aka .NET) should be used for 1980s style programming. Quoth the raven “In this case, it’s so overwhelmingly better than TSQL that I cannot recommend to anyone coding for SQL 2005 to use TSQL for stored procedures, even CRUD procs.

CRUD = Create Read Update Delete. This went out of style by 1990. A client application should not be allowed to perform primitive operations like these for any number of reasons. Performance – we want chunky, not chatty interprocess protocols. Simplicity and maintainability – the client application should not have to understand the full implementation in the database. This should be hidden from and for them. (This is half the reason we use sprocs in the first place!!) Data quality and integrity – if the network connection is severed midway through a run, while half the creates have run and some of the updates, the data is left in an indeterminate state.

To stack the deck, Payton Byrd makes some shocking and unforgivable errors

  1. T-SQL code designed to be as resource intensive and slow as possible.  Not the cursor use.  The test code comes with this warning “This may not be the most efficient way to do this, but I don’t specialize in TSQL stored procs” which should disqualify the results for any thinking person.
  2. One reason the CLR code is assumed to perform faster than the TSQL code is because the .NET procedure uses a StringBuilder.  In a related article, our anti hero tells us if the test data set were larger, the difference would be more pronounced, because of this.  A StringBuilder is a hack, a work around for .NET because the CLR handles strings in a funny way.  When your code creates and then alters a string, both copies live on in memory, because (due to threading) strings are immutable.  Always creating and repointing references is so tiresome for the host computer that a class was written to give programmers an entirely different way of concatenating strings.  SQL does not suffer this limitation.  One of the key assumptions holding up the conclusion (aka guess) giving this article reason to be is fundamentally flawed.
  3. Wild conclusions “The performance results are absolutely stunning, the CLR stored procedure is 14 times faster than the TSQL stored procedure
  4. More profound confusion “I’m willing to do more testing, but I think this pretty well proves the point that you can get much better performance from the CLR than from TSQL when you step beyond the simplest of stored procedures.”   Apparently the meaning of complex was lost on this writer.  Anybody who’s ever done any database work knows that complex != writing loop code for a set based language to put commas between values.

I’m not writing this to make fun of an incompetent developer.  I’m writing to point out flaws of logic that seem all too common these days.  The internet seems to have created a world where everyone is expected (even if only by themselves) to be an expert in some area.  When a supposed expert doesn’t understand something, the rule this day in age seems to be that one should make assumptions and run with them.  As Tripy would say, do you programming, then, when there’s no time left, do your analysis.

An even better way to cause yourself grief is to test an edge case and then assume the general world is described by some crazy scheme.  If I wrote that there’s no reason ever to drive anywhere because I took a train yesterday, arrived at the station a moment before the vessel departed, then found I saved 5 minutes by not parking, people would rightly assume I had lost my wits.

The Texas Sharp Shooter Fallacy describes exactly this.  Payton’s test case addresses a tremendously rare (and poorly coded) situation, one that might never be encountered by a senior database programmer.  From this highly specific test he makes sweeping general conclusions.  A test was engineered to provide support for a conclusion that had already been drawn.  The test tells us nothing, except that we should avoid Payton’s blog if we aim to learn.

March 5, 2008

The Best Captcha I’ve Ever Seen

Filed under: Modern Life,Software — alexanderthegreatest @ 10:49 am
Tags: , , , ,

Captcha = Computer Anal Probe To (confuse) Computers & Humans Alike

Forget squiggly letters!

Captchas are those stupid forms you have to answer when you get a new email address. Most of them are like trying to read the newspaper after a modest dose of LSD. I like how this one gets creative with it.

Some people use a newer (and weaker, it would seem) type of captcha, where the reader has to perform a menial task in order to procede. Math is a pretty common challenge, although it’s usually more trivial than this!

The problem, dear reader, is this. Spammers began to take advantage of every free service on the internet in one form or another. Blogs are targets for comment spam, designed to bring people and search engines to another site. Wikipedia uses them to keep people from doing same, when adding references. Hotmail uses them to prevent the proliferation of viagra in your inbox. The solution, for the past 10 years or more, has been to “challenge” readers to prove their humanity.

I'm tripping!!!

Remember eating magic mushrooms (with chocolate or iced cream to disguise the taste!) and seeing things like this? The gradients in this picture are designed to stand up to a particular attack. A software can examine the image pixel by pixel and look for ones that don’t match the background color. But a gradient means there is no background color! But even the appearance of a background is skewed in a way that makes me feel dizzy.

But making a computer “read” a captcha the way a person would is only one way to break the things. Spammers have long known a easier way, because spammers are a lazy people. They prefer to download a collection of pornography, then upload it into a script making a membership only porn site. Memberships are given away for free, but to activate them requires solving a captcha. See where this is going? Web surfers who want porn do the “hard” work solving the puzzles. Spammers record the answer along with the image in a database, then the next time their software is challenged with the same image, they pull the answer from the database. This is less work (meaning more ROI) than even making a script to execute the math problem (usually something like 4+3) for a simple test we’ve described earlier.

What all this means is that captchas are broken, so don’t rely on them if you need security!

March 3, 2008

SQL Server Deployment Checklist

Filed under: Science,Software — alexanderthegreatest @ 4:37 pm
Tags: , , , ,

John Hicks (of MSDN Blogs) has posted a very interesting article on SQL Server deployments that raised my eyebrows. It begins with the text “Every once in a while, a process gets too complex for humans to manage consistently.

The article goes into great detail about the steps that should be taken to performance optimize a new node in a SQL Server cluster, or to prepare a new server that’s being migrated to. This is extremely high scale stuff here, probably boring to mom n pop shops that need to store and instantly access less than perhaps 20 million records. (PHP users will find this article endlessly tiresome, because PHP users turn their nose up at any system meant to serve 25 or more concurrent users. Such is the demand of “enterprise level” evil corporate programming, a need of monsterish slave masters to rob the noble developers of the world of their identities and very souls. I’m not sure where this prejudice comes from, but it’s unescapable at any number of web forums.) The paper describes physical tuning only, completely ignoring the logical domain.

Some highlights

  1. Set your disc data allocation units (pages) to 64 KB to match the size of an extent.
  2. Set network packet size to 8 KB to take better advantage of large and few TCP packets. This is known as jumbo framing.
  3. Create a TEMPDB filegroup for each CPU core on the server.
  4. Set MAXDOP to correspond to your disc subsystem hardware. (I don’t understand why – maybe someone can explain this to me in the comment stream. Shouldn’t these settings be independant from one another? Disc bound transactions should allow the threads to block and be preempted by CPU bound transactions, no?)
  5. Use TraceFlags to control lock escalation – DANGER, WILL ROBERTSON!
Next Page »

Create a free website or blog at WordPress.com.