The Myth of XML Purity?
Here's a hypothetical. Say there is an client I'm working with that needs to return Valid XML from their system. They've given me XML Schemas and said they are representative of the XML returned. Since Valid follows Well-Formed, sounds good.
Then someone mentions, "oh, well, we can't guarantee that there won't be some < or > or & in the element content. But, that's no problem, right?"
I said, "Well, then technically you are not sending us XML. If you can't escape (or CDATA) out the stray content with < >, then you're not even returning less-than/greater-than delimited files. What if I gave you content like this "123123324","2003-04-05","Scott ",Hans,"elman","Portland?" We have to agree on some fundamentals here. The XML 1.0 spec (and all tools based on it) is very specific." (They won't even CDATA the stuff)
The response? "Well, that's a purist's viewpoint."
I guess I got too mired in the Judeo-Christian Ethic of "Thou shalt not return malformed XML."
QUESTION: What level of Dante's Inferno would I be relegated to if I pre-process this XML-y (pronounced: 'smelly') to make it well-formed?
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I know that doesn't fly, because they are the customer, but maybe it will help illustrate the point.
Sadly, it's probably going to come down to who's paying whom.
"Given enough time and resources, anything is possible."
Actually I think that preprocessing is the correct way to handle it Scott. Some integration partners (or customers) just use a text file that has some structure and lots of angle brackets. It is always a good thing to stop and look around at the state whole business community (non-computer industry). We are spoiled in many ways we have been living and breathing XML, Web Services, WS-I-[insert random letters], etc over the past few years.
I try to rationalize the use of these technologies *correctly* by myself internal to my application versus now someone extends and interfaces with it in a less than desirable fashion. If they do not play nice, they lose the obvious advantages. How people run fiber optic between Portland and Denver is not how you have to connect a phone to the wall. People make trade offs all the time that make no sense to me in many ways but it does to them.
I try to put my preprocessors, adaptors etc in place to isolate me from their less pure methodologies and move on so I can sleep well.
They ask and we do. (And they pay, of course)
Just don't let any documentation (or notes or minutess) refer to the data as XML. It's malformed data, or unclean data or non-compliant XML-like data -- whatever, but it's not XML, and it's not even XML-y (that sounds too official)(although it is a cute name, i'll give you that). This data however, is *not* XML. it never was XML. It just looked a little like XML.
But smile and nod and be happy to pre-process it. They're just the client, they're not expected to know what XML is. That's your job. They came close, sure, but they missed it.
Client's often create much worse things than a few unescaped brackets... by comparison, what they're giving you is gold!
lb
At least you got a schema, we have "mystery XML" where I'm working. We never know what's going to come down the pipe. Last week they decided to substitute the patients record number for the patients gender. So what we thought was a simple trinary value (yes I said trinary, "Male", "Female" and "Unknown" I kid you not) turned into a mishmash. Had we a schema to validate against, we could have said "well that's not a value we're familiar with so reject it" instead the automated feed created a duplicate row for every patient in the database. And they laughed when I suggested an hourly transaction log backup, thank god I didn't listen to them.
Dammit, we need to make people take XML seriously. Otherwise we're back in the world of half-baked HTML and "whatever IE allows"...
Comments are closed.
His comment is a cop-out. If it's defined to a certain way, I think you can reasonably expect it to be that way. He might as well tell you that up is down. Why can't he just escape it?
"Here ya go. Here's a cheese burger".
"I asked for tofu. I'm a vegetarian."
"Close enough. Cheese burger has lettuce, a vegetable. Thus its vegetarian food."
"It's not the same! A cheese burger has cow meat. It's NOT vegetarian food!"
"Well, that's a purist's viewpoint."