Slow AppFabric demo part one
Back in the early days of AppFabric, when it was still just Dublin and Velocity, a demo was given to the management team that started with a simple xamlx workflow. You can probably imagine the concern as a room full of managers silently waited for this very simple workflow to come up. The reports vary on how long it actually took to get a response from the server: anywhere from 40 seconds to a couple minutes. Repeating the test showed it was more like 30 seconds. Still, that can be a really uncomfortable silence when it's your product and you're getting close to a release date. I got the chance to do the performance investigation to find out why it took so long to respond and what action items to take from there.
The demo was hosted on a virtual machine running Windows Server 2008 R2 and was run in a "cold startup" configuration. This means that the machine was recently rebooted and no programs were run before the demo. Since we were working with the .Net 4.0 framework, which hadn't yet been released, it was unlikely that any of the programs loaded on startup used the .Net 4.0 CLR. The reason this is important is because the framework assemblies are shared. So the first process to load them will pay the price for all the disk access while subsequent processes can usually share the memory. Also, being on W2K8 R2 means that there is no SuperFetch like there would be on Windows 7 client.
The next strike against this virtual machine was that the .Net 4.0 framework had been installed but the assemblies had not yet been NGEN'd. If you are unfamiliar with NGEN, each assembly is basically compiled to match the processor architecture. If you do not NGEN an assembly, then the CLR will use JIT at runtime. For throughput scenarios, this is not usually a problem because once the JIT is finished, you'll never have that cost again for that code. But for cold latency, it is a problem. As part of a push to improve the speed at which the .Net framework is installed, NGEN is not performed during the installation of the framework. Instead, each assembly is "scheduled" for an NGEN. This is handled by a Windows service with the display name "Microsoft .NET Framework NGEN v4.0.30319_X86". It waits until the machine is idle and NGENs the assemblies in the background. So, instead of loading NGEN assemblies, it loads GAC assemblies. Besides not having to perform JIT, another benefit of NGEN is that the pages are ordered in such a way that the most commonly used code is placed earlier in the assembly and it can therefore cut down on the number of page faults and working set size. This is not something that happens for every assembly though, only the .Net framework assemblies.
To force NGEN to compile the assemblies, simply use the ngen.exe tool in the framework directory (under \windows\Microsoft.Net) by running "ngen update". Before performing "ngen update", the cold startup scenario took about 28 seconds. Afterwards, it took 21 seconds. There was also a significant chunk of time spent loading assemblies referenced in the root web.config and machine.config. To reduce noise, I cut out the ones that were not essentially to the demo. This is not something I would recommend for most users as the downsides are usually greater than the upsides. But, in this case I had reduced the 21 seconds down to 18.5 seconds.
To find out where these 18.5 seconds were spent, I took an ETW trace. ETW is my favorite performance tool because not only do you get a system-wide view of what's going on, you can also pick from all kinds of providers to focus on information you want to gather. Since Dublin uses IIS and WCF/WF, I used those providers and collected stack traces (something I'll talk about in a later post). To take the trace, I used xperf, which is now part of the Windows SDK, and included the following providers:
Microsoft-Windows-IIS-W3SVC {05448E22-93DE-4A7A-BBA5-92E27486A8BE}
Microsoft-Windows-Application Server-Applications {C651F5F6-1C0D-492E-8AE1-B4EFD7C9D503}
The "Server-Applications" provider is the one used for WCF/WF events. With latency scenarios, it is very easy to see where the time is spent as the ETW events can demarcate individual areas of execution.
This graph is what you would see by using xperfview (or xperf /view). The blue dots indicate activity in IIS, the red dots are WCF/WF. It is pretty clear from this graph where the handoffs are between IIS and WCF, as well as that the majority of time is spent in WCF. We can use the xperf tool to check the stack traces during each time period. Let's first look at the time when IIS is starting up, before it hands off to WCF. Here are the stacks and inclusive times during that time period:
13% | | => webengine4.dll!GetIsapiProcessHost
10% | | |clr.dll!ClrCreateManagedInstance
3% | | ||clr.dll!EnsureEEStarted
3% | | || clr.dll!EEStartup
2% | | || clr.dll!EEStartupHelper
2% | | || Other
7% | | ||clr.dll!TypeName::GetTypeUsingCASearchRules
7% | | | ...clr.dll!TypeName::GetTypeWorker
6% | | | clr.dll!LoadDomainAssembly
6% | | | => clr.dll!AssemblySpec::LoadDomainAssembly
43% | | => System.Web.ni.dll!System.Web.Hosting.ApplicationManager::CreateAppDomainWithHostingEnvironment
2% | | |System.Web.ni.dll!System.Web.Configuration.WebConfigurationHost::OpenConfiguration
2% | | | System.Configuration.ni.dll!System.Configuration.Internal.InternalConfigConfigurationFactory::System.Configuration.Internal.IInternalConfigConfigurationFactory.Create
2% | | | System.Configuration.ni.dll!System.Configuration.Configuration::.ctor
2% | | | => System.Configuration.ni.dll!System.Configuration.Internal.InternalConfigRoot::GetConfigRecord
2% | | | => ...System.Configuration.ni.dll!System.Configuration.BaseConfigurationRecord::Init
2% | | | => Other
3% | | |mscorlib.ni.dll!System.AppDomain::CreateDomain
3% | | | ...clr.dll!AppDomainNative::CreateDomainHelper
3% | | | Other
35% | | |clr.dll!TransparentProxyStub_CrossContext
35% | | ...clr.dll!CrossDomainChannel::MarshalAndCall_Wrapper
34% | | clr.dll!DispatchCall
34% | | ...clr.dll!CallDescrWorker
3% | | |mscorlib.ni.dll!System.AppDomain::InternalCreateInstanceWithNoSecurity
3% | | | ...mscorlib.ni.dll!System.Activator::CreateInstance
2% | | | mscorlib.ni.dll!System.Reflection.RuntimeAssembly::InternalLoad
2% | | | => ...clr.dll!AssemblySpec::LoadDomainAssembly
31% | | |System.Web.ni.dll!System.Web.Hosting.HostingEnvironment::Initialize
31% | | System.Web.ni.dll!System.Web.Hosting.HostingEnvironment::Initialize
3% | | |System.Web.ni.dll!System.Web.Configuration.HttpConfigurationSystem::EnsureInit
3% | | | Other
25% | | |System.Web.ni.dll!System.Web.HttpRuntime::HostingInit
3% | | ||System.Web.ni.dll!System.Web.HttpRuntime::GetInitConfigSections
2% | | || System.Web.ni.dll!System.Web.Configuration.RuntimeConfig::GetAppLKGConfig
2% | | || System.Web.ni.dll!System.Web.CachedPathData::GetConfigPathData
2% | | || => System.Web.ni.dll!System.Web.CachedPathData::GetConfigPathData
2% | | || => Other
19% | | ||System.Web.ni.dll!System.Web.Compilation.BuildManager::InitializeBuildManager
19% | | || System.Web.ni.dll!System.Web.Compilation.BuildManager::Initialize
5% | | || |System.Web.ni.dll!System.Web.Compilation.MultiTargetingUtil::EnsureFrameworkNamesInitialized
2% | | || ||System.Web.ni.dll!System.Web.Compilation.MultiTargetingUtil::InitializeKnownAndLatestFrameworkNames
2% | | || || Microsoft.Build.Utilities.v4.0.ni.dll!Microsoft.Build.Utilities.ToolLocationHelper::GetSupportedTargetFrameworks
2% | | || || Other
2% | | || ||Other
13% | | || |System.Web.ni.dll!System.Web.Compilation.BuildManager::CheckTopLevelFilesUpToDate
12% | | || System.Web.ni.dll!System.Web.Compilation.BuildManager::CheckTopLevelFilesUpToDate2
11% | | || System.Web.ni.dll!System.Web.Configuration.CompilationSection::get_RecompilationHash
11% | | || => System.Web.ni.dll!System.Web.Compilation.CompilationUtil::GetRecompilationHash
11% | | || => System.ni.dll!System.CodeDom.Compiler.CompilerInfo::get_IsCodeDomProviderTypeValid
11% | | || => ...clr.dll!AssemblySpec::LoadDomainAssembly
15% | |webengine4.dll!W3_MGD_APP_CONTEXT::InitializeMgdApplication
15% | ...System.Web.ni.dll!System.Web.Hosting.PipelineRuntime::InitializeApplication
15% | System.Web.ni.dll!System.Web.HttpApplicationFactory::GetSpecialApplicationInstance
15% | => System.Web.ni.dll!System.Web.HttpApplication::InitSpecial
15% | => System.Web.ni.dll!System.Web.HttpApplication::RegisterEventSubscriptionsWithIIS
12% | |System.Web.ni.dll!System.Web.HttpApplication::GetModuleCollection
12% | | ...System.Web.ni.dll!System.Web.Compilation.BuildManager::GetType
12% | | mscorlib.ni.dll!System.Type::GetType
12% | | ...clr.dll!AssemblySpec::LoadDomainAssembly
5% | | |clr.dll!LoadDomainAssemblyHelper
5% | | clr.dll!AppDomain::LoadDomainAssembly
16% |ntoskrnl.exe!KiPageFault
16% | ntoskrnl.exe!MmAccessFault
3% | |ntoskrnl.exe!KiCheckForKernelApcDelivery
3% | | ntoskrnl.exe!KiDeliverApc
3% | | ntoskrnl.exe!EtwpStackWalkApc
2% | | => ntoskrnl.exe!EtwpTraceStackWalk
2% | | => ...ntoskrnl.exe!RtlpWalkFrameChain
2% | | => ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks
12% | |ntoskrnl.exe!MiIssueHardFault
12% | ntoskrnl.exe!IoPageRead
As you can see above, there are many areas where assemblies are loaded and lots of time spent there. Virtual machines have slow disk access and this is to be expected. We can also see some page faults going on during this time period, which is also normal. In the next post, I'll cover more of this scenario.
Comments
Anonymous
January 19, 2011
I see managed call-stacks in Xperf. Can Xperf decode managed call-stacks? I thought it couldn't.Anonymous
January 19, 2011
This one doesn't work with xperf available to the public. And i believe that's the reason Vance Morrison built PerfMonitor. msdn.microsoft.com/.../gg490356.aspxAnonymous
January 20, 2011
You are correct! My mistake. Managed stack analysis in xperf hasn't come out yet. Instead of using xperf, you should be able to get everything from VS2010's performance analyzer. I deleted the original response since it would just be confusing.