Removing XML elements from an input document (with large message support)

In my last post I talked about taking an XML message and stripping out elements.  In that post I used a standard MemoryStream object and a reader requested that I also talk about how this pipeline could be created to deal with streaming of large messages.  Lets take a look at what those modifications would look like.

 

When dealing with large incoming messages you don't always want to load the message into memory - which is typically what the MemoryStream object does.  BizTalk has included a set of objects in the Microsoft.BizTalk.Stream.dll (located in the GAC) that provide a means of handling large messages.  Within this assembly is the VirtualStream class (the source code can be found in the SDK under the \Program Files\Microsoft BizTalk Server 2006\SDK\Samples\Pipelines\ArbitraryXPathPropertyHandler directory in the VirtualStream.cs file).  This objects behavior provides the means to keep the stream data in memory up to 4MB (this is the default threshold) and once that threshold is reached the additional data is written to a temporary file on the hard drive.

 

If you decide that this functionality is what you are looking for then you need to keep a couple of things in mind.  Since the BizTalk service will be writing a file to the hard drive you need to make sure that the BizTalk account has the security privileges to write to the %temp% folder (this will be the %temp% folder on each of the BizTalk servers for each of the host instances).  Also, by default, the %temp% folder is placed on the C:\ drive.  You need to decide if that is where you want the temp file to be created.  You most likely will want to configure your server to put it on a different drive since the C:\ is typically configured with a small amount of storage space.  You will also want to make sure that the location of the new %temp% directory is not getting backed up.   

 

So, the first thing is that we need is to set a reference to the Microsoft.BizTalk.Streaming.dll.  This is located in the GAC and therefore you will not be able to add the reference within Visual Studio to this location.  You will need to copy the .dll from the GAC and place a copy of it somewhere else on your hard drive and add the reference to that copy.  You can do this through a command prompt and navigate to its location (on my machine it was C:\WINDOWS\assembly\GAC_MSIL\Microsoft.BizTalk.Streaming\3.0.1.0__.......). 

 

Instead of using a MemoryStream we will use the VirtualStream object.  When we new up our object we are presented with 6 overloads.  We want to use the overload that allows us to pass in the VirtualStream.MemoryFlag.  The options of this enum are OnlyToDisk, OnlyInMemory and AutoOverFlowToDisk.  We want to use the AutoOverFlowToDisk.  If you want to change the memory threshold then use the following overload; VirtualSream(int bufferSize, VirtualStream.MemoryFlag).

 

So, the Execute method now looks like this:

 

        public …... Execute(……….)

        {

            try

            {

                IBaseMessagePart bodyPart = inmsg.BodyPart;

                VirtualStream vs = new VirtualStream(VirtualStream.MemoryFlag.AutoOverFlowToDisk);

 

                if (bodyPart != null)

                {

                    Stream originalStream = bodyPart.GetOriginalDataStream();

 

                    if (originalStream != null)

                    {

                        XmlTextReader Xtr = new XmlTextReader(originalStream);

 

                       XmlTextWriter Xtw = new XmlTextWriter(vs, Encoding.UTF8);

   …..

   …..

   …..

 

             }

               

                vs.Position = 0;

bodyPart.Data = vs;

pc.ResourceTracker.AddResource(vs);

                return inmsg;

            }

 

The items in bold outline the lines that changed when we used the VirtualStream object.