Partilhar via


Choking on (Very Large) XML Files

You probably don't know much about what I actually do at Microsoft (yet), so now would be a good time to mention that I typically support e-mail hosters who use Microsoft Hosted Messaging and Collaboration (HMC). These type of customer tend to have a lot (and I mean a lot) of Exchange configuration objects. They tend to have several thousand each of address lists, GALs, OABs and accepted domains for instance.

A typical XML file created by ExchDump (https://www.microsoft.com/downloads/details.aspx?familyid=d88b807d-964e-4bf8-9344-754892e9f637&displaylang=en) for one of these hosters might be 500MB+, while the HTML file will be almost as large and completely indigestible--if somehow you have enough system resources to open it in Internet Explorer.

I set about to break the XML into sections that I could actually use to gain some useful information. Naturally I turned to PowerShell. The first thing I tried was to load the file by foolishly casting the contents of the file to System.XML.XMLDocument. After waiting several minutes I realized what a dumb move that was. Thank you CTRL+C. From that point the task became all about text parsing.

Param ( $FilePath = "C:\Data" )

$SubPath = "$FilePath\ExchDumpObjects"

If(-not (Test-Path "$SubPath")){

   mkdir $SubPath

}

$ObjectString = @()

type "$FilePath\ExchDump_*.xml" |
%{
if($_.trim() -eq "<ADSI-Object>") {
#### "Start new Object!!!"
$ObjectString = @()
$ObjectClass = ""
$ObjectName = ""
$ObjectString += $_
}
elseif($_.trim() -eq "</ADSI-Object>"){
#### "End object"
$ObjectString += $_
$TickCount = (get-date).ticks
If(-not (Test-Path ("$SubPath\" + $ObjectClass))){

            mkdir ("$SubPath\" + $ObjectClass)

         }
$ObjectString | out-file `

           "$SubPath\$ObjectClass\$($ObjectClass)_$ObjectName.$TickCount.xml"
} else {
#### "Continue object"
$ObjectString += $_
If($_ -match "ADSI_Obj_Class"){
$ObjectClass = $_.Trim().Replace("<ADSI_Obj_Class>","")

            $ObjectClass = $ObjectClass.Replace("</ADSI_Obj_Class>","")

            $ObjectClass = $ObjectClass.Trim()
}
ElseIf($_ -match "ADSI_Obj_Name"){
$ObjectName = $_.Trim().Replace("<ADSI_Obj_Name>","")

            $ObjectName = $ObjectName.Replace("</ADSI_Obj_Name>","")

            $ObjectName = $ObjectName.Replace("CN=","").Trim()
}
}
}

 

So to run the script, you can do one of two things—either place your xml file in the folder C:\Data which is the default directory where the script will look for it, or pass in the file folder as an argument.

When you’re done, you’ll have a subfolder in the same folder with the ExchDump output named ExchDumpObjects. Inside that folder will be subfolders for each type of AD object that ExchDump gathered. Inside those folders, there will be a single XML file for each object collected by ExchDump. For instance if I wanted to find a particular OAB Object named “Demo OAL”, I would look for a file like “C:\Data\ExchDumpObjects\msExchOAB\msExchOAB_Demo OAL.*.xml”. You might find that ExchDump outputs the object multiple times. I’m not sure why that is, but the objects tend to be for the most part identical when that happens.

 

Once you’ve found the object you’re looking for, you may want to use PowerShell to parse it as well. This is sort of tricky because the XML which ExchDump uses isn’t the absolute best of formats. Here’s a snippet you can use for getting attributes out of those files:

 

[PS] $FP = "C:\Data\ExchDumpObjects\msExchOAB\msExchOAB_Demo OAL.*.xml"

[PS] $OAB = [xml] (type $FP)

[PS] $Atr = $oab."ADSI-Object".attribute | ?{$_.innertext -like "*offlineABContainers*"}

[PS] $Atr.attrib_val

  "CN=Demo AL,CN=All Address Lists,CN=Address Lists Container,CN=hmc,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=fabrikam,DC=net"

[PS] $Atr = $oab."ADSI-Object".attribute | ?{$_.innertext -like "*offlineABServer*"}

[PS] $Atr.attrib_val

  "CN=OAB01,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=hmc,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=fabrikam,DC=net "

 

(As a side note, yes, a lot of my scripts will be just as esoteric as this one. It’s super fantastic if you find yourself in the same exact predicament as me, but that’s not very likely unless you are supporting a very large e-mail hoster. The same sort of technique may apply to other scenarios where you have very large XML files, but for the most part this is a niche script.)