Optimizing Away Repeat XML Namespace Declarations with DataContractSerializer
For performance reasons, DataContractSerializer can’t always figure out what namespaces will be used in a serialized instance ahead of time. And so, you may sometimes end up having a certain XML namespace defined over and over again when it only has to be declared once. This can be particularly painful because XML namespaces tend to be very long. In the worst cases, namespace declarations can end up representing a majority of the serialized instance and significantly hinder your performance. So here’s a way to make sure that doesn’t happen to you, although it does take a little bit of tinkering.
Suppose you have the following DataContracts:
[DataContract(Namespace="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/")]
public class A
{
[DataMember]
public B B;
public A() {
B = new B();
}
}
[DataContract(Namespace = "https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/")]
public class B
{
[DataMember]
public string s = "foo";
}
If you try serializing an array of five A’s like this:
var o = new A[5];
for (int i = 0; i < 5; i++)
{
o[i] = new A();
}
var ser = new DataContractSerializer(o.GetType());
var writer = new XmlTextWriter(Console.Out) { Formatting = Formatting.Indented };
ser.WriteObject(writer, o);
you’ll get the following XML:
<ArrayOfProgram.A xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<Program.A>
<B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<d3p1:s>foo</d3p1:s>
</B>
</Program.A>
<Program.A>
<B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<d3p1:s>foo</d3p1:s>
</B>
</Program.A>
<Program.A>
<B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<d3p1:s>foo</d3p1:s>
</B>
</Program.A>
<Program.A>
<B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<d3p1:s>foo</d3p1:s>
</B>
</Program.A>
<Program.A>
<B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<d3p1:s>foo</d3p1:s>
</B>
</Program.A>
</ArrayOfProgram.A>
Notice how the highlighted namespace was defined five times, creating a lot of bloat when it could just be defined once at the top-level. To fix this issue, you can use the following code:
ser.WriteStartObject(writer, o);
writer.WriteAttributeString("xmlns", "p", null, "https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/");
ser.WriteObjectContent(writer, o);
ser.WriteEndObject(writer);
instead of
ser.WriteObject(writer, o);
We first write the start of the object, then register the long namespace with the prefix “p” at the top level, write the object itself, and finally the end of the object. This results in a much more compact XML on the wire that’s equivalent to the XML we generated earlier:
<ArrayOfProgram.A xmlns:p="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/" xmlns:i="https://www.w3.org/2001/XMLSchema-
instance" xmlns="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">
<Program.A>
<B>
<p:s>foo</p:s>
</B>
</Program.A>
<Program.A>
<B>
<p:s>foo</p:s>
</B>
</Program.A>
<Program.A>
<B>
<p:s>foo</p:s>
</B>
</Program.A>
<Program.A>
<B>
<p:s>foo</p:s>
</B>
</Program.A>
<Program.A>
<B>
<p:s>foo</p:s>
</B>
</Program.A>
</ArrayOfProgram.A>
All of the repeat namespace declarations are gone, in favor of just one namespace definition at the top. Of course, it’s also possible to integrate this type of serialization into WCF. Just create a serializer that inherits from XmlObjectSerializer that uses a DataContractSerializer for all of its methods, except for the fact that it registers additional namespaces at the top level. Then create a behavior that derives from DataContractSerializerOperationBehavior with a CreateSerializer method that returns the XmlObjectSerializer you just created and plug in the behavior.
Oh, and if you get a chance, take a look at the trailer for the Office 2010 movie everyone's so excited about.
Comments
Anonymous
August 12, 2009
Thanks for this article, it was very helpful to strip namespaces! Here is the code i'm using for a specialized DataContractSerializerExt as you suggest at the end. [code] public class DataContractSerializerExt : XmlObjectSerializer { private readonly DataContractSerializer serializer; readonly List<string> namespaces = new List<string>(); public DataContractSerializerExt(Type type) { serializer = new DataContractSerializer(type); InitNamespaces(type, null); } public DataContractSerializerExt(Type type, IEnumerable<Type> knownTypes) { serializer = new DataContractSerializer(type, knownTypes); InitNamespaces(type, knownTypes); } public DataContractSerializerExt(Type type, string rootName, string rootNamespace) { serializer = new DataContractSerializer(type, rootName, rootNamespace); InitNamespaces(type, null); } public DataContractSerializerExt(Type type, XmlDictionaryString rootName, XmlDictionaryString rootNamespace) { serializer = new DataContractSerializer(type, rootName, rootNamespace); InitNamespaces(type, null); } public DataContractSerializerExt(Type type, string rootName, string rootNamespace, IEnumerable<Type> knownTypes) { serializer = new DataContractSerializer(type, rootName, rootNamespace, knownTypes); InitNamespaces(type, knownTypes); } public DataContractSerializerExt(Type type, XmlDictionaryString rootName, XmlDictionaryString rootNamespace, IEnumerable<Type> knownTypes) { serializer = new DataContractSerializer(type, rootName, rootNamespace, knownTypes); InitNamespaces(type, knownTypes); } public DataContractSerializerExt(Type type, IEnumerable<Type> knownTypes, int maxItemsInObjectGraph, bool ignoreExtensionDataObject, bool preserveObjectReferences, IDataContractSurrogate dataContractSurrogate) { serializer = new DataContractSerializer(type, knownTypes, maxItemsInObjectGraph, ignoreExtensionDataObject, preserveObjectReferences, dataContractSurrogate); InitNamespaces(type, knownTypes); } public DataContractSerializerExt(Type type, string rootName, string rootNamespace, IEnumerable<Type> knownTypes, int maxItemsInObjectGraph, bool ignoreExtensionDataObject, bool preserveObjectReferences, IDataContractSurrogate dataContractSurrogate) { serializer = new DataContractSerializer(type, rootName, rootNamespace, knownTypes, maxItemsInObjectGraph, ignoreExtensionDataObject, preserveObjectReferences, dataContractSurrogate); InitNamespaces(type, knownTypes); } public DataContractSerializerExt(Type type, XmlDictionaryString rootName, XmlDictionaryString rootNamespace, IEnumerable<Type> knownTypes, int maxItemsInObjectGraph, bool ignoreExtensionDataObject, bool preserveObjectReferences, IDataContractSurrogate dataContractSurrogate) { serializer = new DataContractSerializer(type, rootName, rootNamespace, knownTypes, maxItemsInObjectGraph, ignoreExtensionDataObject, preserveObjectReferences, dataContractSurrogate); InitNamespaces(type, knownTypes); } private void InitNamespaces(Type mainType, IEnumerable<Type> knownTypes) { List<Type> types = new List<Type>(); types.Add(mainType); if ( knownTypes != null) { types.AddRange(knownTypes); } List<Assembly> assemblies = new List<Assembly>(); foreach (Type type in types) { if ( ! assemblies.Contains(type.Assembly)) { assemblies.Add(type.Assembly); } } // Add Contract namespace defined at the assembly level foreach (Assembly assembly in assemblies) { Attribute attr = Attribute.GetCustomAttribute(assembly, typeof (ContractNamespaceAttribute)); if ( attr != null ) { ContractNamespaceAttribute contractNamespaceAttribute = (ContractNamespaceAttribute) attr; if ( contractNamespaceAttribute.ContractNamespace != null && ! namespaces.Contains(contractNamespaceAttribute.ContractNamespace) ) namespaces.Add(contractNamespaceAttribute.ContractNamespace); } } // Add Contract namespace defined at the type level foreach (Type type in types) { Attribute attr = Attribute.GetCustomAttribute(type, typeof (DataContractAttribute)); if ( attr != null ) { DataContractAttribute dataContractAttribute = (DataContractAttribute) attr; if (dataContractAttribute.Namespace != null && !namespaces.Contains(dataContractAttribute.Namespace)) namespaces.Add(dataContractAttribute.Namespace); } } } public override void WriteObject(Stream stream, object graph) { XmlDictionaryWriter writer = XmlDictionaryWriter.CreateTextWriter(stream, Encoding.UTF8, false); WriteObject(writer, graph); writer.Flush(); } public override void WriteObject(XmlDictionaryWriter writer, object graph) { WriteStartObject(writer, graph); // Skip first default namespace (standard xmlns without alias) // and add the following namespace with x[0-9]+ alias for (int i = 1; i < namespaces.Count; i++ ) { string aliasNamespace = "x" + ((i > 1) ? (i-1).ToString() : ""); writer.WriteAttributeString("xmlns", aliasNamespace, null, namespaces[i]); } WriteObjectContent(writer, graph); WriteEndObject(writer); } public override void WriteObject(XmlWriter writer, object graph) { WriteObject(XmlDictionaryWriter.CreateDictionaryWriter(writer), graph); } public override void WriteStartObject(XmlDictionaryWriter writer, object graph) { serializer.WriteStartObject(writer, graph); } public override void WriteObjectContent(XmlDictionaryWriter writer, object graph) { serializer.WriteObjectContent(writer, graph); } public override void WriteEndObject(XmlDictionaryWriter writer) { serializer.WriteEndObject(writer); } public override object ReadObject(XmlDictionaryReader reader, bool verifyObjectName) { return serializer.ReadObject(reader, verifyObjectName); } public override bool IsStartObject(XmlDictionaryReader reader) { return serializer.IsStartObject(reader); } [/code]Anonymous
December 23, 2009
Great Article! Refering to the prefix of the namespace can make things easier. Its a good namespace design approach.Anonymous
February 24, 2010
I'm having a similar problem, but its the "http://www.w3.org/2001/XMLSchema" namespace that the DataContractSerializer is repeating. Unfortunately, I don't always know all of the types that I need to use (and prefer to specify the KnownTypes attribute on the DataContracts), so manually adding all of the namespaces to the top of the xml document won't work for me. Instead I just want to use the existing namespaces at the top, but just add the "http://www.w3.org/2001/XMLSchema" namespace. Any ideas how one would do this?Anonymous
March 01, 2010
Hi MattO, the solution offered here should still work for you. You don't need to specify all of the namespaces at the top. You can just feel free to specify any namespace you want. Namespaces you do specify will always get written out at the top-level. But namespaces that the serializer needs that you don't specify yourself will get written out by the serializer for you whenever they're needed.Anonymous
April 26, 2010
I have this problem now for very large object graphs in WCF data contracts. How can I make WCF do something similar?Anonymous
May 14, 2012
This works great - After reading this article (blogs.msdn.com/.../wcf-extensibility-ioperationbehavior.aspx) I was able to implement my custom XmlObjectSerializer on the server. Now, anybody know how to get this working in Silverlight? It seems that DataContractSerializerOperationBehavior is not available.Anonymous
May 16, 2012
Nevermind - found this: blogs.msdn.com/.../wcf-extensibility-custom-serialization-in-silverlight.aspx