Alexander The Great

December 8, 2008

Html Agility Pack

Filed under: Programming — alexanderthegreatest @ 10:42 pm
Tags: , , , ,

Has anyone else used this? I’m teetering between it’s given name and Html Agitation Pack. Written by an MSDN guru and moved to CodePlex, this is the XmlDocument of the web.

An aside, people who use open source languages like Pearl and Ruby will be shocked to know how difficult it is for Microsoft developers to use html programatically. We’re able to consume xml very quickly, so long as it’s well formed, but any error in the markup renders the whole document unreadable. Microsoft’s design goal was to never guess at the developer’s intent, so, anything the least bit ambiguous is an exception. Agility Pack is an open source library for parsing html and making the guesses MS was unwilling to make, outside IE.

I’m finding it slow. The software has trouble with certain encodings, and, worse, it throws stack overflow exceptions! This means it makes far too heavy use of recursion. Genereally a loop (sometimes with a stack or a queue) will fix the problem, but it’s very heard to search for, in such a large code base.

Still, this hasn’t stopped others from finding creative uses for the library. The page localizer is a fascinating example. And here’s a converter, allowing LINQ over web pages!


April 15, 2008

Great C# Resource!

Filed under: Software — alexanderthegreatest @ 11:10 am
Tags: , ,

Microsoft has long operated like Reagan’s Evil Empire. I’m sure everybody reading this blog knows the story of MS-DOS, which isn’t all that unlike the story of Manhattan being sold for $32. Mr Gates (who stepped down from the helm years ago, but will remain the symbol of Microsoft’s market dominance for years to come) is an historic philanthropist but also a shrewd business man.

So it’s interesting to watch this turn around. Windows is facing intense pressure from all sides. Google is a media darling and suitor to the thrown, we’ve already seen gmail face off against Exchange Server, plus Word and Excel be surrounded by Google Docs and Spreadsheets, and the more formidable Open Office. Vista has become the new Windows ME. Rumors of a Google OS are frightening Microsoft, Linux looks just like Windows, making the transition from a $200 operating system to a free one all the easier, and Mac’s Boot Camp means all the more choice for the consumer. Choice has traditionally been an enemy of Windows – having computer users by the proverbial throat is one of the main reasons most people use Microsoft software.

Pride notwithstanding, the people at Microsoft would be fools not to recognize the situation they find themselves in.  IBM’s example makes the situation all the more relevant.  Microsoft has enjoyed a long reign at the top of their game, and it seems to be all downhill from here.

So they’re trying to “change”, to open up.  Microsoft has blogs galore – not just ones people can create on a subdomain (a la WordPress or Blogspot) but coming from their brass.  Microsoft publishes betas, or as they like to call them CTP community test preview releases.  Minimally stripped down versions of their almost flagship Visual Studio are available as free downloads, with free licenses, for anybody who wants one.  Are you a C++ programmer, ideally from the *nix world?  Come on over, the grass is green on the default XP wallpaper.  Set up shop, have a free compiler, and sell your wares.

This is a bright strategy.  At present, in the Spring of 2008, Windows honestly is the best platform for general use.  This is quickly changing, but the fact is that most of the software available for all systems is for Windows, and more to the point, most of the software a non savvy family or company would want is for Windows.  Why else would Microsoft give away much of a product (Visual Studio) that runs anywhere from $50 (old version with no frills) to several thousand?  Why else would they use betas, abundant white papers, MSDN?  They want people writing Windows software, keeping their operating system afloat.  The more people who write exclusively for Windows, the more reason other people have for running Windows.

And all of this brings us full circle back to the title of this post.  One of the people on the C# design team has a blog about using C#.  This includes how to code samples (MD5 a string, improve hash table perf against structs), explanations on why certain features do and don’t exist in the language, and so on.  All C# developers will benefit from reading this blog.

March 19, 2008

C# and VB.NET Stored Procedures Suck

Filed under: Programming — alexanderthegreatest @ 1:59 pm
Tags: , , , , , , , ,

Somebody found the blog you’re reading by searching Google for evidence that “c# stored procedures suck“. (Probably this post, possibly this one.) This search phrase isn’t exactly true – C# procedures by themselves can be a good thing, under the rarest of circumstances. The pain comes when people use them in all the billions of ways they aren’t suited to. On the other hand, VB sprocs always suck. 🙂

First, you don’t work with them in the same way as normal procedures. At best you can think of them as encrypted procs, and that’s if you don’t have to deploy them. The way to do that is compiling through the command line (CSC.EXE /t:library your_csharp_code_file.cs), then you have to write SQL pointing to a location in the file system! First you run a Create Assembly [SysName] From [File Path To DLL] query and then you create your procedure pointing to the fully qualified name of the method in the code. Don’t forget to make it public accessibility and decorate it with the SqlProcedure attribute. Don’t even think about lieing if a function isn’t deterministic – SQL will believe you!!!

If you could type your C# or VB code into SQL Server Management Studio, hit F5, and have it compiled and deployed to the current database, and then sp_helptext a .NET proc, it wouldn’t hurt maintainability of code. Think of the poor DBA trying to track down the cause of bad performance when code they have no knowledge or view into is involved.

SQL Server has its own memory management, its own threading and scheduling, it’s own locking, and talks to the operating system. It’s a system. You can tap into it with the Common Language Runtime, but the CLR runs in a very different way, and the interop causes sometimes great overhead. The managed heap is there in all cases but God help you if your code uses threads or calls win32 directly.

When .NET Procs Don’t Suck

It seems like the answer would be “when you don’t know T-SQL very well” but that’s how production systems are brought to a limp. The CLR is absolutely not a replacement for T-SQL. It’s an add on for some very specific scenarios.

String Manipulation

As a programmer at a consulting firm I can’t even count how many times I’ve seen transact query language being used to parse out a comma delimited list. This is attrocious. The code will make you vomit, and the server isn’t that much happier, it’s like asking Babe Ruth to throw your dog a ball.

.NET code is a lot more elegant at this kind of task, and it’s optimized to do this sort of thing. Not strings in particular, actually, because they’re immutable to make threading easier. But logic and object manipulation are the domain of coding systems other than structured query language and it’s extensions.


The CSV string should really be XML, whether you use regular SQL or decide to write in Java Sharp #. This way you don’t have to worry about character limits (expecially in unicode!). You don’t have to write a parser.
SQL Server 2000 and 2005 have support for XML, using Microsoft’s XML DOM or MSXML. Purely from a SQL code stand point, there’s an easy way and a cumbersome way to do this. You can grab the values directly with XPath, or you can use OpenXML and maintian a handle to the parsed document. Unless you do the second method, you’re forcing SQL to parse the XML over and over again. But that’s only realistic from within a single batch. The same goes for FOR XML AUTO in SELECT queries – this uses the DOM to create the XML result.
Of course .NET has a different and more granular caching mechanism in general. A developer can parse XML once, store some things for later use, and get to everything in its time. It can make use of the object data before generating XML from it. Any of these things, done correctly, can improve scalability by reducing SQL Server overhead. Especially when it comes to XML reparsing. Let the database server do what it’s good at, but take non RDBMS work somewhere else.

Custom Data Types

I don’t have much to say on this one myself, but Microsoft has an in depth code sample showing how to work with spacial data using SQL CLR sprocs.

In Conclusion

Use SQL CLR procs, be they in C# or VB, very sparingly. But don’t hesitate to use them when they’re the best tool in the kit. Microsoft put more arrows in our quiver, but we should still use them carefully, and have the right one on hand when we need to slay those dragons.

March 13, 2008

SQL Danger

Filed under: Evolution,Programming,Software — alexanderthegreatest @ 3:53 pm
Tags: , , , , , ,

SQL Server supports a cooperative, non-preemptive threading model in which the threads voluntarily yield execution periodically, or when they are waiting on locks or I/O. The CLR supports a preemptive threading model. If user code running inside SQL Server can directly call the operating system threading primitives, then it does not integrate well into the SQL Server task scheduler and can degrade the scalability of the system. The CLR does not distinguish between virtual and physical memory, but SQL Server directly manages physical memory and is required to use physical memory within a configurable limit.


If you’re scratching your head, thank your lucky stars you don’t need to understand this gibberish. I’ve been having to focus on SQL Server and .NET Integration, also known as SQL CLR integration. A lot of people have made fairly bad choices in the very, very recent past. Just because you can write your stored procedures now in Visual Basic doesn’t mean you should. Obviously it cuts both ways, and that doesn’t mean you shouldn’t – the trouble seems to come when people can’t decide whether they should or not.

Having to help guide that decision, I found the 1st paragraph above, which has some frightening guidance on the matter. It seems to suggest never to use .NET at first glance, but that isn’t really the case. What it really says is long running code that might interfere with SQL’s thread scheduling can be bad. Code that spawns new threads really shouldn’t be hosted in SQL. PInvoke calls can hurt.

But XML processing is an example of something that does none of those things. And while SQL has good XML support, for some operations, .NET is better. Also the set based nature of SQL compared to the procedural and object oriented nature of C# mean you have better control over caching in .NET, so by porting this type of sproc, you can parse an XML document in memory once, instead of once for every time you need to access it.

After all, it’s not SyBase in 1997!

March 7, 2008

Hungarian Notation

Filed under: Evolution,Programming,Software — alexanderthegreatest @ 3:44 pm
Tags: , , ,

Hungarian Notation is a naming convention whose main rule is that the prefix in a variable name should be longer than the name of the variable itself. For example, should you find yourself needing a string called Foo, you might call the variable that refers to it as gpnzstrFoo. You would know this because

  • g = global variable
  • p = pointer, because in C++ everything is a pointer
  • nz = null terminated (sometimes sz)
  • str = string value
  • Foo = the actual name of the variable

This is a mouthful and a lot to memorize. Further, it means that when you refactor your code and change the type (you might encapsulate string handling into a SuperString class as a schoolboy example) you have to either change the name of the variable which is happens in O(n) time, with n being the number of uses. Modern development environments offer refactoring services to safely accomplish this, but they also provide other services that make HN redundant and unnecessary.

Hungarian gives an illuminating view of the history of software development, and of Windows in particular. In the old days one would pull down the Windows “header file” (#include “windows.h”) and delve into its contents, using Microsoft as an example of successful large scale development efforts done right. In fact Microsoft is often credited with birthing Hungarian, in order to make scrolling work in Word and Excel. Today, like Windows 3.1, HN is relegated to the scrap heap of history.

Here is the part that might have fell web site producers call me a spammer – I’m going to copy and paste from a treatise on Hungarian Notation, some text which explains the fallbacks of this convention much better than I ever could. (Borrowing liberally from other sources, sometimes called “scraping”, indicates a lazy webmaster and is typically associated with made for adsense spam blogs. In this case, however, I’m simply trying to point my dear readers to a helpful resource, for those of you who are interested in this topic.)

My problem with Hungarian Notation is more fundamental and stylistic – I think it encourages sloppy, sprawling, poorly decomposed code and careless, ill-coordinated maintenance. Simply put, if your namespace is so polluted that you need a cheap trick like HN to keep track of your variables, you’ve made a terrible mistake somewhere. You should never have so many variables and constants visible at one time; if you do, then you need to review either your data structures, or your functions.

This is especially true for code written under either the OO or FP methodologies, as a primary goal in each is to isolate (in OO) or eliminate (in FP) those variables that are not immediately needed.

HN notation also presents a problem in the case of OO, in that it interacts very poorly with polymorphism. It is actually undesirable to know the class of an object in many instances, but at the same time marking it with the parent class tag is misleading and contradicts the goal of HN. As for marking globals as separate from locals, why on earth do you have any globals in the first place? 😉 — JayOsako

I agree that HungarianNotation is bad because it’s a crutch for overly large namespaces. That’s why I use single character variable names to force myself to write clear code. — EricUlevik

My Response: 26 variables? At once? How can you manage that?

Seriously, HungarianNotation is only a symptom of a larger problem. Too many programmers see the solution for excessive state to be more state; they end up with variables to track variables to track variables, all bound up with arcane rules for naming that are supposed to be indicative of their meaning (which it rarely is). The point isn’t so much that you have to limit the number of variables, but rather that large proliferations of variables in the local namespace is a sign of a design mistake. Why would they all be visible at once? Aren’t there some abstractions they can bound into? Why do you need this particular bit of state visible here, anyway?

The rule should be simple: if you don’t need to see it, you shouldn’t be able to. Period.

Like VisualTools, HungarianNotation is not SoftwareEngineering, its TheIllusionOfSoftwareEngineering. It’s an easy and comfortable fix that looks good because you can DoItNow? and you don’t have to do all the hard thinking that comes from trying to do the RightThing. The time you ‘save’ using it will be more than spent later trying to maintain the resulting morass. — JayOsako

This is very well put, and pretty much sums up my feelings on the matter. — KevlinHenney

Any interested programmers are encouraged to read the original page. This bit cracked me up, since we’ve mentioned the historical implications of the subject

What I think you really want is for your interactive development environment to give you a rollover (ToolTip?, for you Windows junkies, or minibuffer message for you Emacs junkies) that shows you the declaration of this variable at every use point when your cursor goes over it. — RusHeywood

Visual Slick Edit (Windows) does this.

VC6 has something approaching this. They copied it from VB – when you’re typing code to call a method, it pops up with a prompt window showing the names of the arguments. It’s not perfect, but it’s a step in the right direction. — RogerLipscombe

Blog at