Never Assume Anything (aka, XML woes)

Chock another one up for the world of careless. Working on a project, I assumed that the XML files I was working on were set up correctly. A dangerous assumption, I might add, as there were no schemas (or DTDs for that matter) to describe the XML expected by the system. I should add that the XML in question is completely custom and completely insane, but going further would require a <rant /> tag. 🙂

I figured I would help the matter by creating a schema. I first attempted this using XSD.exe, but it is so focused on DataSets that dupe names caused a puke. I changed a couple of element names and regened, opened the XSD and changed the names back. Sure, it is a really crude way to do this, but I was not really in the mood to write a bunch of XSD by hand (laziness, I assume).

I ran the XSD against valid targets for the system. It worked. Ran it against the XML I had created, which was edited from a previous run. It worked fine. Attempted to run it in the system and … well, let’s just say, I smelled smoke and I have little hair left on top of my head (Editor’s note: Greg was already fairly bald, so this is just a bad attempt at humor). After a bit of investigation, I found the following.

  1. The XML in question had NEVER worked. It was dead code that appeared to be doing something useful, but was really a ruse.
  2. The XML in question used invalid operators for a published W3C schema. Not sure what the developer was thinking on this one. NOTE, however, that this did not blow up the engine, as other "valid" examples have this illegal tag, as well.
  3. The XSD was incorrect for the rules. This one is actually on Microsoft’s nickle, to an extent, as it did not recognize the booleans. The other half was on the original developer who decided "yes" and "no" were nice boolean values.

Which leads me to my point:

Best Practices

  • If you are working with a custom XML format, as soon as it is stable (i.e., prior to anyone else consuming the XML), create a way of validation. A schema is preferred, but DTDs work with older systems. The point is make sure you have a way to distinguish good from bad in your system. Too many people consider this an OPTIONAL step; it should be MANDATORY.
  • If you are working with well known XML formats, use the schema provided. If there is no schema, create one. You can’t get there from here if you do not have a proper map. Make sure you have the map before you start the trip.
  • In your application, make a schema check prior to attempting to consume. If the XML is invalid for your system, you will save quite a few cycles if you bomb out at the front of your process rather than half way in. In addition, you can easily determine the error was invalid XML and save the maintenance guy a lot of trouble hacking code to find the error.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: