This will be of limited interest to most readers, however it’s still worth covering, in purely academic terms. Brian Keever has written a post on XML Purification, or turning non-compliant into well-formed XML.
Extensible Markup Language is the meta language of the computing gods. While Oracle and SQL Server have proprietary binary formats which are clearly superior in terms of raw speed, as well as in some instances file size, they’re more than a nightmare for compatibility. No surprise that different technology has a different set of pros and cons, in fact of design goals.
The law of the land, as delivered to us by Microsoft, is that imperfect XML is unreadable. Not even a single byte of data may be taken from an XML document with even the slightest flaw. The reasoning makes enough sense, “software cannot be responsible for guessing at a developer’s intentions.” Part of the hype of XML is universal compatibility, though – in fact, XML uses Unicode (UTF-16) to allow for internationalization. Such invalid characters as an accented vowel can destroy the ability to read a document, though.
Thus, Mr Keever has delivered unto us a way to fix such XML that has valid markup but illegal characters.