OutOfMemoryExceptions while remoting very large datasets
When you have to pass an object back and forth between processes or application domains you have to serialize it into some type of stream that can be understood by both the client and the server.
The more complex and big the object gets the more expensive it is to serialize, both CPU wise and memory wise, and if the object is big and complex enough you can easily run into out of memory exceptions during the actual serialization process... and that is exactly what happened to one of my customers...
They had to pass very large datasets back and forth between the UI layer and the datalayer and these datasets could easily get up to a couple of hundred MB in size. When they passed the datasets back they would get OutOfMemory Exceptions in stacks like this one... in other words they would get OOMs while serializing the dataset passing it back to the client...
0454f350 773442eb [HelperMethodFrame: 0454f350]
0454f3a8 793631b3 System.String.GetStringForStringBuilder(System.String, Int32, Int32, Int32)
0454f3d0 79363167 System.Text.StringBuilder..ctor(System.String, Int32, Int32, Int32)
0454f3f8 793630cc System.Text.StringBuilder..ctor(System.String, Int32)
0454f408 651eadee System.Data.DataSet.SerializeDataSet(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext, System.Data.SerializationFormat)
0454f448 651eaa5b System.Data.DataSet.GetObjectData(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext)
0454f458 7964db64 System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(System.Object, System.Runtime.Serialization.ISurrogateSelector, System.Runtime.Serialization.StreamingContext, System.Runtime.Serialization.Formatters.Binary.SerObjectInfoInit, System.Runtime.Serialization.IFormatterConverter, System.Runtime.Serialization.Formatters.Binary.ObjectWriter)
0454f498 793ba2bb System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.Serialize(System.Object, System.Runtime.Serialization.ISurrogateSelector, System.Runtime.Serialization.StreamingContext, System.Runtime.Serialization.Formatters.Binary.SerObjectInfoInit, System.Runtime.Serialization.IFormatterConverter, System.Runtime.Serialization.Formatters.Binary.ObjectWriter)
0454f4c0 793b9cef System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(System.Object, System.Runtime.Remoting.Messaging.Header[], System.Runtime.Serialization.Formatters.Binary.__BinaryWriter, Boolean)
0454f500 793b9954 System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(System.IO.Stream, System.Object, System.Runtime.Remoting.Messaging.Header[], Boolean)
0454f524 6778c0b0 System.Runtime.Remoting.Channels.BinaryServerFormatterSink.SerializeResponse(System.Runtime.Remoting.Channels.IServerResponseChannelSinkStack, System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Channels.ITransportHeaders ByRef, System.IO.Stream ByRef)
0454f57c 6778bb0f System.Runtime.Remoting.Channels.BinaryServerFormatterSink.ProcessMessage(System.Runtime.Remoting.Channels.IServerChannelSinkStack, System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Channels.ITransportHeaders, System.IO.Stream, System.Runtime.Remoting.Messaging.IMessage ByRef, System.Runtime.Remoting.Channels.ITransportHeaders ByRef, System.IO.Stream ByRef)
0454f600 67785616 System.Runtime.Remoting.Channels.Tcp.TcpServerTransportSink.ServiceRequest(System.Object)
0454f660 67777732 System.Runtime.Remoting.Channels.SocketHandler.ProcessRequestNow()
0454f690 677762a2 System.Runtime.Remoting.Channels.RequestQueue.ProcessNextRequest(System.Runtime.Remoting.Channels.SocketHandler)
0454f694 67777693 System.Runtime.Remoting.Channels.SocketHandler.BeginReadMessageCallback(System.IAsyncResult)
0454f6c4 7a569ca9 System.Net.LazyAsyncResult.Complete(IntPtr)
0454f6fc 7a56a46e System.Net.ContextAwareResult.CompleteCallback(System.Object)
0454f704 79373ecd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0454f71c 7a56a436 System.Net.ContextAwareResult.Complete(IntPtr)
0454f734 7a569bed System.Net.LazyAsyncResult.ProtectedInvokeCallback(System.Object, IntPtr)
0454f764 7a61062d System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
0454f79c 79405534 System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
0454f93c 79e7c74b [GCFrame: 0454f93c]
My gut feeling was that they were SOL. I know that serialization is very memory expensive and that the resulting serialized xml strings can get enormous so I wasn't very surprised, especially knowing how large their datasets were.
I am not a data access guru, but I have seen this type of issue enough times that I knew what the recommendation should be.
1. Re-think the architecture... what are you using these datasets for? who will be browsing through 100s of MBs of data anyways? (and this still holds true, in most cases where there is this much data involved only a very small part of it is needed and if that is the case, then only a very small piece of the data should be handled, i.e. filter out what you need and leave the rest)
2. Re-consider passing this data through remoting/webservices/out-of-proc session state or whatever it might be. Once you start serializing and deserializing this amount of data you are threading on thin ice when it comes to the scalability of your application, both performance and memory wise. Again, this still holds true, if the dataset itself is 100 MB you will only be able to have a handful of concurrent requests before you run out of memory for the datasets alone.
3. If you really really really need this much data and this architecture you need to start thinking about moving to 64 bit, but even there you need to be careful so that you have enough RAM and disc space to back up the memory you're using, and still you need to be careful, because the more memory you use, the longer it will take to perform full garbage collections.
We discussed a couple of options like bringing back partial datasets, chunking it up, but still most of it was a no-go.
Debugging
I created a very small remoting sample with just one method that returns a very large dataset (you can find the code for the sample at the bottom of this post... just to see how much memory we were actually using for the serialization (the dataset itself was 102 MB).
I attached to the remoting server with windbg and loaded up sos (.loadby sos mscorwks) and then I set a breakpoint on mscorwks!WKS::gc_heap::allocate_large_object so that I could record the size of the allocation (?@ebx) and the stack (!clrstack) everytime we allocated a large object (I figured this was enough for a rough estimate)
0:004> x mscorwks!WKS*allocate_large*
79ef212d mscorwks!WKS::gc_heap::allocate_large_object = <no type information>
0:004> bp 79ef212d "?@ebx;!clrstack;g"
Low and behold, the last attempted allocation before the OOM was a whooping 1 142 400 418 bytes (~1 GB!!!! for a 100 MB dataset)
Evaluate expression: 1142400418 = 4417a5a2
OS Thread Id: 0x128c (4)
ESP EIP
0454f350 79ef212d [HelperMethodFrame: 0454f350]
0454f3a8 793631b3 System.String.GetStringForStringBuilder(System.String, Int32, Int32, Int32)
0454f3d0 79363167 System.Text.StringBuilder..ctor(System.String, Int32, Int32, Int32)
0454f3f8 793630cc System.Text.StringBuilder..ctor(System.String, Int32)
0454f408 651eadee System.Data.DataSet.SerializeDataSet(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext, System.Data.SerializationFormat)
0454f448 651eaa5b System.Data.DataSet.GetObjectData(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext)
...
When you try to allocate an object like that it needs to be allocated in one chunk. Since it is larger than the size of the LOH segment we will try to create a segment the size of the object, and in my case I just didn't have 1 GB of free space in my virtual memory in one large chunk, so the allocation fails with an OOM.
Fine, what did I learn from this? well, I just confirmed what I already knew, that serialization is very expensive. In fact in my case I had to allocate 1 GB to serialize 100 MB so a factor of 10, and that is not even all... if I would have been successful in allocating this, I would still have had to allocate some more intermediate strings in the neighborhood of a couple of hundred MBs, so all in all it seemed like an insurmountable task to serialize a dataset this big.
Solutions
I mentioned a few earlier, which basically include, don't serialize datasets this big, and if you must, then go to 64-bit.
I remembered though, that on 1.1 there was an article that had some suggestions on how to optimize the serialization by creating dataset surrogates, i.e. wrapper classes that performed their own serialization rather than using the standard one that remoting uses. https://support.microsoft.com/kb/829740
I knew things had changed in 2.0 so that article was no longer applicable, but I didn't really know what it had changed to, so I went on an internet search and found this article that turned out to explain a loot of good stuff about serialization of datasets.
https://msdn.microsoft.com/en-us/magazine/cc163911.aspx
The article suggests that you should change the serialization method if you need to remote very large datasets. I did this by adding one single line to the remoting server, before returning the dataset
ds.RemotingFormat = SerializationFormat.Binary;
Then I re-ran the test and didn't get the OOM. Not only that, but when I ran it through the debugger with the same breakpoint... instead of the 1 GB allocation, I ended up with 5 * 240 k allocations and one 225 k allocation used for the serialization (not counting any non-large objects). Memory wise, that is an improvement of 100 000% for one extra line in your code, that's a little bit hard to beat:)
Have a good one,
Tess
Sample code used for this post
Server:
using System;
using System.Runtime.Remoting;
using System.Runtime.Remoting.Channels;
using System.Runtime.Remoting.Channels.Tcp;
using System.Data;
namespace MyServer
{
class Program
{
static void Main(string[] args)
{
MyServer();
}
static void MyServer()
{
Console.WriteLine("Remoting Server started...");
TcpChannel tcpChannel = new TcpChannel(1234);
ChannelServices.RegisterChannel(tcpChannel, false);
Type commonInterfaceType = Type.GetType("MyServer.DataLayer");
RemotingConfiguration.RegisterWellKnownServiceType(commonInterfaceType, "DataLayerService", WellKnownObjectMode.SingleCall);
Console.WriteLine("Press ENTER to quit");
Console.ReadLine();
}
}
public interface DataLayerInterface
{
DataSet GetDS(int rows);
}
public class DataLayer : MarshalByRefObject, DataLayerInterface
{
public DataSet GetDS(int rows)
{
//populate a table with the featured products
DataTable dt = new DataTable();
DataRow dr;
DataColumn dc;
dc = new DataColumn("ID", typeof(Int32));
dc.Unique = true;
dt.Columns.Add(dc);
dt.Columns.Add(new DataColumn("FirstName", typeof(string)));
dt.Columns.Add(new DataColumn("LastName", typeof(string)));
dt.Columns.Add(new DataColumn("UserName", typeof(string)));
dt.Columns.Add(new DataColumn("IsUserAMemberOfTheAdministratorsGroup", typeof(string)));
DataSet ds = new DataSet();
ds.Tables.Add(dt);
for (int i = 0; i < rows; i++)
{
dr = dt.NewRow();
dr["id"] = i;
dr["FirstName"] = "Jane";
dr["LastName"] = "Doe";
dr["UserName"] = "jd";
dr["IsUserAMemberOfTheAdministratorsGroup"] = "No";
dt.Rows.Add(dr);
}
ds.RemotingFormat = SerializationFormat.Binary; //<-- this line makes a world of difference
return ds;
}
}
}
Client:
using System;
using System.Runtime.Remoting;
using System.Runtime.Remoting.Channels;
using System.Runtime.Remoting.Channels.Tcp;
using System.Data;
using MyServer;
namespace Client
{
class Program
{
static void Main(string[] args)
{
TcpChannel tcpChannel = new TcpChannel();
ChannelServices.RegisterChannel(tcpChannel, false);
Type requiredType = typeof(DataLayerInterface);
DataLayerInterface remoteObject = (DataLayerInterface)Activator.GetObject(requiredType, "tcp://localhost:1234/DataLayerService");
DataSet ds = remoteObject.GetDS(600000);
Console.WriteLine("Number of rows in ds: " + ds.Tables[0].Rows.Count.ToString());
Console.ReadLine();
}
}
}
Comments
Anonymous
September 02, 2008
The comment has been removedAnonymous
September 02, 2008
i didn't encountered it yet.thanks for sharing this.Anonymous
September 06, 2008
Tess I have been starting to get familiar with Windbg and I have been trying to understand the advtange of this over the vs.net debugger. All I read is this is powerful than vs.net debugger mainly for its ability for kernel mode debugging. Say I have a fully managed application is this going to be useful at all? Also can you tell any other differences btw windbg and vs.net debugger? thanksAnonymous
September 07, 2008
Indeed Tess, 100K % improvement is a nice record to set. I have seen some pretty dramatic performance improvements from small changes but this one gets the gold medal I guess. And as you and other commenters have said already ... having datasets that big moving other the network is rather scary. Besides memory usage, think about LAN / WAN load with a couple dozens concurrent users. I guess unless you need to do SETI type processing on very large data collections, there is no case to make for such an architecture. Just out of curiosity, what was the customer's reason for wanting to keep the huge datasets instead of breaking down their API to be less chunky ?Anonymous
September 07, 2008
.NETProMesh.NETv2.0RC1isoutSqlClientTimeoutsRevealedOutOfMemoryExceptionswhileremotingv...Anonymous
September 07, 2008
.NET ProMesh.NET v2.0 RC1 is out SqlClient Timeouts Revealed OutOfMemoryExceptions while remoting veryAnonymous
September 12, 2008
The comment has been removedAnonymous
September 15, 2008
Here are of some of the reader emails I got this week and my answers to them... How do I troubleshootAnonymous
March 25, 2009
Hi Tess, I've a question about this tool WinDbg. I've developed a .net remoting application and sometimes I get an error about sockets connections. So I would like to trace the Socket. But I'm using a remote Windows Services. ThanksAnonymous
April 03, 2009
A few weeks back me and Micke (one of our Architect Evangelists) had a session at TechDays where we talkedAnonymous
April 29, 2009
Sweeeeeeet! This issue just cropped up in a production application being used by a technician out in the field. If it wasn't resolved today, he wouldn't have been able to use the application. You are a life saver! Thank you!Anonymous
May 11, 2009
I have put together a quick and dirty debug diag script for troubleshooting .net memory leaks. (attachedAnonymous
May 27, 2009
The comment has been removedAnonymous
May 27, 2009
with that amount of data you might have other issues including mem issues that could manifest themselves as OOMs depending on how you handle the resulting data and how many people are on the server doing this simultaneously etc... but the issue described above only applies to datasets that are serialized (with remoting or webservices for example)Anonymous
July 22, 2009
Hi, I m facing one Prob.. Plz help me out.... Query : I m working on one application which is already developed. Application was storing the Image file in SQL Server DB. my role is to provide one fuctionality that will download the image on client machine. When i m filling the dataset , Its become very bulcky n Memoryoverload exception is coming. If i am going every now & then on DB, Performance is not degrading. I also tried to take the data in chops & process but performance is not proper. Kindly suggest. Thanks, Rashesh GandhiAnonymous
August 24, 2009
Excellent, finally i know how to set the bp when clr allocates virtual memory.Anonymous
March 11, 2010
The same but different - It would seem that when visual studio saves these large dataset definitions to disk, it encounters the same problem. I have a dataset design I can no longer open without running out of memory and vs crashing. At least I think this must be the problem at this stage. About to split the beast up and see if the problem is resolved.Anonymous
October 14, 2010
Hi , Finally i got the solution. Using Mutlithreading paradigm, we can fix this issue. I have created multiple thread , each is assigned a batch size of (Say for example :) 100 . This approached had fixed my issue. Sorry for late update. Regards, Rashesh GandhiAnonymous
January 18, 2011
Tess, I am SO glad I found your blog. Your suggestion of setting the RemotingFormat has been a lifesaver! Jim BlackAnonymous
January 19, 2011
We are using .net 3.5 sp1 with remoting. In server config, we configured to use binary format for Http channel. But when serialize dataset it still using xml formatter ??? Due to that we see outofmemory exception when we use large datasets?? How can I set globally, so that I datasets serialization will use Binray?? <channels> <channel ref="http"> <serverProviders> <formatter ref="Binary" /> </serverProviders> </channel> </channels>Anonymous
June 16, 2011
Awesome this blog just made my day :)