Subtle Behaviors in the XML Serializer can kill
Dan Maharry is/was having a heck of a time with the XmlSerializer after upgrading an application from .NET 1.1 to .NET 2.0.
Given this XSD/schema:
<element name="epp" type="epp:eppType" /> <complexType name="eppType"> <choice> <element name="hello" /> <element name="greeting" type="epp:greetingType" /> </choice> </complexType>
The .NET 1.1 framework serializes a greeting element thusly (actually by incorrect and lucky behavior in the 1.x serializer):
<?xml version="1.0" encoding="utf-8"?> <epp xmlns="urn:ietf:params:xml:ns:epp-1.0"> <greeting> <svID>Test</svID> <svDate>2006-05-04T11:01:58.1Z</svDate> </greeting> </epp>
but although it seemed to be fine initially in .NET 2.0, he started getting this instead.
<?xml version="1.0" encoding="utf-8"?> <epp xmlns="urn:ietf:params:xml:ns:epp-1.0"> <hello d2p1:type="greetingType" xmlns:d2p1="http://www.w3.org/2001/XMLSchema-instance"> <SvID>Test</SvID> <svDate>2006-05-04T10:55:07.9Z</svDate> </hello> </epp>
Dan worked with MS Support and filed a bug in the Product Feedback labs and attached an example if you'd like to download it.
Unfortunately, this isn't a bug. The problem is caused by the ordering of the elements in the original schema causing the XmlElement attributes to stack in the same order resulting in the wrong semantics:
[System.Xml.Serialization.XmlTypeAttribute(Namespace = "urn:ietf:params:xml:ns:epp-1.0", TypeName = "eppType")]
[System.Xml.Serialization.XmlRootAttribute("epp", Namespace = "urn:ietf:params:xml:ns:epp-1.0", IsNullable = false)]
public class EppType
{
private object item;
[System.Xml.Serialization.XmlElementAttribute("hello", typeof(object))]
[System.Xml.Serialization.XmlElementAttribute("greeting", typeof(GreetingType))]
public object Item
{
get
{
return this.item;
}
set
{
this.item = value;
}
}
}
The problem is that the semantics of the schema and the resulting XmlSerializer attributes say "This object can be either an object or a GreetingType." Well, a GreetingType IS an object, so the 2.0 serializer happily obliges.
Reversing those two lines in the XSD and regening the CS file with XSD.EXE expresses the correct intent. "This object can be a GreetingType or any other kind of object." and the expected (original) output is achieved. If Dan can't change the original schema (which is likely wrong) then he'll have to change the generated code to get the semantics he wants. Not a bad thing, actually. I did the same thing with the code generated from the OFX schemas.
Using a previously published tip called HOW TO: Debug into a .NET XmlSerializer Generated Assembly I add an app.config with these lines:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<system.diagnostics>
<switches>
<add name="XmlSerialization.Compilation" value="1" />
</switches>
</system.diagnostics>
</configuration>
And check the contents of the Temp Directory by going Start|Run and typing in "%temp%" and pressing enter. I then sort by Date Modifed.
I run the test program twice, once the original way and once with the lines reversed (my "fix") and diff the geneated .cs files in BeyondCompare.
You can see from the picture above exactly where the difference is, in the middle of a series of if/elseifs that basically are saying "what kind of object is this?"
The XmlSerializer is glorious and wonderful until it's totally not. I know that's not going to make Dan or his team feel better, but hang in there, it gets better the more you use it.
UPDATE: Dan has an interesting update that points out that the order of the attributes generated isn't regular, nor is the order they come back via reflection. James weighs in as well. My solution worked because there were only two attributes. Nutshell - order matters, but it's not regular.
I'm not defending the XmlSerializer folks, although it may sound like I am. James says "it looks like a bug to me." Personally I think it's less a bug and more a complex and poorly documented edge case that highlights the funamental differences between the XML type system and the CLR type system. At the edges, it's dodgey at best.
I think where we're all getting nailed here is that that XSD Type System can represent things that the CLR Type System can't. Full stop.
In Schema, xs:choice is a complex thing, much like unions in C. The XmlSerializer chooses to present xs:choice as a Object that you have to downcast yourself. The mapping is uncomfortable at best. However, beyond this one uncomfortable type mapping, there are structures you can present in Schema that simply have no parallel in the CLR and the mappings won't ever been 100%. This is just what happens when translating between type systems. The same thing happened(s) for years with nullable DB columns as simple types got translated into the CLR and we leaned on IsDBNull. With the XmlSerializer they introduced the whole and parallel field with a "Specified" suffix.
In this instance, if it were me using this schema and dealing with these documents I'd switch over to implementing IXmlSerializable. IXmlSerializable provides coverage for the final few percent that the XmlSerializer doesn't provide. It doesn't solve the problem of mapping between type-systems, but it at least puts YOU in control of the decisions being made.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I filed a bug on this when I first tripped over it, but it wasn't 'til beta2, and it was too late by then.
http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=3b9ee554-d097-433a-a062-7fc2986c779b
I haven't played with the Indigo/WCF XML serialization layer much yet, but hopefully it has better support for nullables. :)
Looks like IXmlSerializable is the answer here...
Comments are closed.
XmlSchemaProviderAttribute:
http://shrinkster.com/fdz
IXmlSerializable:
http://shrinkster.com/fdx
http://shrinkster.com/fdy
Type Matching for Xml and .NET:
http://shrinkster.com/fdv