An aside, people who use open source languages like Pearl and Ruby will be shocked to know how difficult it is for Microsoft developers to use html programatically. We’re able to consume xml very quickly, so long as it’s well formed, but any error in the markup renders the whole document unreadable. Microsoft’s design goal was to never guess at the developer’s intent, so, anything the least bit ambiguous is an exception. Agility Pack is an open source library for parsing html and making the guesses MS was unwilling to make, outside IE.
I’m finding it slow. The software has trouble with certain encodings, and, worse, it throws stack overflow exceptions! This means it makes far too heavy use of recursion. Genereally a loop (sometimes with a stack or a queue) will fix the problem, but it’s very heard to search for, in such a large code base.
Still, this hasn’t stopped others from finding creative uses for the library. The page localizer is a fascinating example. And here’s a converter, allowing LINQ over web pages!