May 2015
Volume 30 Number 5
Visual Studio 2015 - Analyze Performance While Debugging in Visual Studio 2015
By Charles Willis | May 2015
Many developers spend the majority of their time getting an app to function correctly. Less time is spent focused on app performance. While there have been profiling tools in Visual Studio for quite some time, they were a separate set of tools to learn. Many developers didn’t take the time to learn and use them when performance issues would arise.
This article will introduce the new Diagnostic Tools debugger window in Visual Studio 2015. It will also describe how to use it to analyze performance as a regular part of your debugging workflow. I’ll first provide an overview of the debugger’s features and capabilities and then go on a deep dive walk-through. I’ll show you how to use PerfTips to time sections of code between breakpoints and steps, how to use the Diagnostic Tools window to monitor CPU and memory, and how to take snapshots to drill deep into memory growth and leaks.
The features in this article are available for debugging most managed and native projects. Microsoft is continually adding support for more project types and debugging configurations. For up-to-date information about currently supported features, check out the blog post on the Diagnostic Tools window at aka.ms/diagtoolswindow. A separate article in this issue will explain how you can use IntelliTrace within the Diagnostic Tools window (see “Use IntelliTrace to Diagnose Issues Faster,”) to quickly identify the root causes of bugs in your code.
Performance While Debugging
Instead of running a full profiling tool, you might take one or more of the following steps:
- Insert code into an app (such as System.Diagnostics.Stopwatch) to measure how long it takes to run between various points, iteratively adding stopwatches as needed to narrow down the hot path.
- Step through code to see if any particular step “feels slow.”
- Hit the Break All (“pause”) button at random points to get a feel for how far execution has progressed. In some circles this is referred to as “poor man’s sampling.”
- Over-optimize code without measuring performance at all, sometimes by applying a set of performance best practices across the entire code base.
These practices are typically not accurate, not a good use of time or both. That’s why there are now performance tools in the debugger. They will help you understand your app’s performance during normal debugging.
Diagnostic Tools Window The primary difference you’ll notice when debugging code in Visual Studio 2015 is the new Diagnostic Tools window that will appear, as shown in Figure 1. These Diagnostic Tools present information in two complementary ways. They add graphs to the timeline in the upper-half of the window, and provide detailed information in the tabs on the bottom.
Figure 1 The New Diagnostic Tools Window in Visual Studio 2015
In Visual Studio 2015, you’ll see three tools in the Diagnostics Tools window: Debugger (includes IntelliTrace), Memory Usage and CPU Usage. You can enable or disable the CPU Usage and Memory Usage tools by clicking on the Select Tools dropdown. The Debugger tool has three tracks that show Break Events, Output Events and IntelliTrace Events.
Break Events History and PerfTips The Break Events let you see how long each section of code took to run. The rectangles represent the duration from when the app started or resumed execution until the Debugger made it pause (see Figure 2).
Figure 2 Break Events and PerfTips
The beginning of the rectangle indicates where you started running the app by using the Continue (F5), Stepping (F10, F11, Shift+F11) or Run-to-cursor (Ctrl+F10) commands. The end of the rectangle indicates where the app stopped because it hit a breakpoint, completed a step or because you used Break All.
The duration of the most recent Break Event is also displayed in the code at the end of the current line in the Debugger. This is called PerfTips. It lets you keep an eye on performance without taking your eyes off your code.
In the details table below the graph, you can also see the history and duration of the Break Events and PerfTips in a tabular format. If you have IntelliTrace, additional events are displayed in the table. You can also use the filter to show only Debugger to see only the history of Break Events.
CPU and Memory Analysis The timeline automatically selects time ranges as you set breakpoints and step. When you hit a breakpoint, the current time range is reset to display only the most recent Break Event. The selection can expand to include the latest Break Event. You can override automatic time range selection by clicking on a Break Event rectangle or by clicking and dragging on the timeline.
The time range selection lets you correlate ranges on the CPU Usage and Memory Usage graphs so that you can understand the CPU and memory characteristics of specific sections of code. The graphs continue to update while the app is running, which lets you keep an eye on CPU and memory as you interact with your app. You can switch to the Memory Usage tab to take snapshots and view a detailed breakdown of memory usage.
IntelliTrace Performance Insights IntelliTrace (not available in the Visual Studio Community version) lets you gain more insight into performance when debugging managed code. IntelliTrace adds two tracks to the Debugger Events timeline: the Output track and the IntelliTrace track. These events include information shown in the Output window, plus additional events collected by IntelliTrace, such as Exceptions, ADO.NET and so on. The events in these tracks are also shown in the Debugger Events Table.
You can relate IntelliTrace events to CPU Usage spikes and Memory Usage graphs. The time stamps show you how long various actions in your app take. For example, you can add Debug.WriteLine statements in your code and use the timestamps on the Output events to see how long it takes to run from one statement to the next.
Improve Performance and Memory
Now that you’ve seen the window’s features, we’ll dive into practical uses of the tools. For this section, we’ll walk through solving a set of performance problems in a sample app called PhotoFilter. This app downloads pictures from the cloud and loads pictures from the user’s local pictures library so that he can view them and apply image filters.
If you want to follow along, download the source code from aka.ms/diagtoolswndsample. You’ll notice different numbers because performance is different on different machines. It will even vary from run to run.
Slow Application Startup When you start debugging the PhotoFilter application, you’ll find it takes a long time for the application to start and pictures to load. This is an obvious problem.
When you’re debugging an app’s functional issues, you often form a hypothesis and start debugging based on that. In this case, you could hypothesize pictures are slow to load, and look for a good place to set a breakpoint and test that hypothesis. The LoadImages method is a great place to do that.
Set a breakpoint at the start and end of the LoadImages function (as shown in the code in Figure 3) and start debugging (F5). When the code hits the first breakpoint, press Continue (F5) again to run to the second breakpoint. Now there are two break events in the Debugger Events timeline.
Figure 3 The LoadImages Method
The first step shows the application ran for only 274 ms before hitting the first breakpoint. The second shows it took 10,476 ms to run the code in the LoadImages function before hitting the second breakpoint. You can also see the elapsed time PerfTip in the code showing the same value. So you’ve narrowed the issue down to the LoadImages function.
To get more information and time how long each line takes, restart debugging so you hit the first breakpoint again. This time, step through each line of code in the method to see which lines are taking the longest. From the PerfTips and duration of the debug break events, you can see GetImagesFromCloud takes 7,290 ms, LoadImagesFromDisk takes 736 ms, the LINQ query takes 1,322 ms and the rest complete in less than 50 ms each.
The timing for all lines is shown in Figure 4. The line numbers shown represent the line of the end of the break event, so Line 52 means how long it took to step over line 51. Now drill further into the GetImagesFromCloud method.
Figure 4 The Debugger Events Table Shows the Elapsed Time for Each Step
The GetImagesFromCloud method performs two logically separate actions, as shown in Figure 5. It downloads the list of pictures from the server and the thumbnails for each picture synchronously (one at a time). You can time these two actions by unsetting your existing breakpoints and placing new ones on the following lines:
line 63: HttpClient client = new HttpClient();
line 73: foreach (var image in pictureList)
line 79: }
Figure 5 The GetImagesFromCloud Method (Top) and Improved Code (Bottom)
Restart the debugging process and wait until the app hits the first breakpoint. Then let the app run (by pressing F5 to continue) to the second breakpoint. This lets the app retrieve the list of pictures from the cloud. Then let the app run to the second breakpoint to measure downloading the thumbnails from the cloud. The PerfTips and Break Event tells you it took 565 ms to get the list of pictures and 6,426 ms to download the thumbnails. The performance bottleneck is in the thumbnail download.
When you look at the CPU Usage graph (shown in Figure 6) as the method retrieves the list of images, you can see it’s relatively high. The graph is fairly flat while the thumbnails download, which indicates this process spends a long time waiting on network I/O.
Figure 6 The CPU Usage Graph Indicates Delayed Network I/O
To minimize the time spent waiting for the round-trips between the client and server, start all the thumbnail download operations at once and wait for them to finish by awaiting completion of the .NET System.Tasks. Replace lines 73 to 79 (from the code in Figure 5) with the following code:
// Download thumbnails
var downloadTasks = new List<Task>();
foreach (var image in pictureList)
{
string fileName = image.Thumbnail;
string imageUrl = ServerUrl + "/Images/" + fileName;
downloadTasks.Add(DownloadImageAsync(new Uri(imageUrl), folder, fileName));
}
await Task.WhenAll(downloadTasks);
When you time this new version, you can see it takes only 2,424 ms to run. That’s about a four-second improvement.
Debug Memory Growth and Leaks If you looked at the Memory Usage graph while diagnosing the slow startup, you might have noticed a sharp increase in memory usage as the app started. The list of thumbnails is a virtualized list, and only one full-size image is displayed at a time. One of the virtues of using a virtualized list is it only loads content displayed on the screen, so you wouldn’t expect many thumbnails in memory at once.
To get to the root cause of this issue, you must first find where in the code the memory growth occurs. Then, take snapshots before and after the growth. Compare those snapshots, and you’ll find the object types that are contributing most to the growth in memory.
The Memory Usage graph shows a high-level view of how your application is using memory. There’s a performance counter named Private Bytes for your app. Private Bytes is a measure of the total amount of memory allocated to a process. That doesn’t include memory shared with other processes. It includes the managed heap, native heap, thread stacks and other memory (such as private sections of loaded .dll files).
When developing a new app or diagnosing an issue with an existing one, unexpected growth on the Memory Usage graph will often be the first indication you have code that isn’t behaving as intended. Watching the graph, you can use Debugger features such as breakpoints and stepping to narrow down the code path of interest.You can determine from the line numbers and duration shown in the Debugger Events tab back in Figure 4that the line responsible for the unexpected growth was line 52, the LoadImagesFromDisk method call. Taking a snapshot will often be the next step in pinpointing unexpected memory usage. In the Memory Usage tab, click the Take Snapshot button to generate a snapshot of the heap. You can take snapshots at breakpoints or while the app is running.
If you know which line of code caused the memory usage spike, then you have an idea where to take the first snapshot. Set a breakpoint on the LoadImagesFromDisk method and take a snapshot when your code reaches the breakpoint. This snapshot serves as the baseline.
Next, step over the LoadImagesFromDisk method, and take another snapshot. Now, by diffing the snapshots, you’ll be able to see which managed types have been added to the heap as a result of the function call you stepped over. The graph again shows the memory utilization spike you’re investigating (as shown in Figure 7). You can also see by hovering over the graph that the memory was 47.4MB. It’s a good idea to make a mental note of the number of megabytes, so you can later verify your fix has had a meaningful impact.
Figure 7 There Is a Noticeable Spike in Memory Usage
The details view displays a brief overview of each snapshot. The overview includes the snapshot’s sequential number, the running time (in seconds) when the snapshot was taken, the size of the heap, and the number of live objects on the heap. Subsequent snapshots will also display the change in size and object counts from the previous snapshot.
The process of taking a snapshot enumerates only those objects still live on the heap. In other words, if an object is eligible for garbage collection, it’s not included in the snapshot. This way, you don’t need to worry about when a collection last ran. The data in every snapshot is as if a garbage collection just occurred.
The size of the heap displayed in the snapshot overview will be lower than the Private Bytes displayed in the Memory Usage graph. While the Private Bytes overview shows all types of memory allocated by your process, the snapshot displays the size of all live objects on the managed heap. If you see a large increase in the Memory Usage graph, but growth in the managed heap doesn’t account for most of it, the growth is occurring elsewhere in memory.
From the snapshot overview, you can open the Heap View and investigate the contents of the heap by type. Click on the diff link in the Objects (Diff) column for your second snapshot to open the Heap View in a new tab. Clicking that link will sort the types in the Heap View by the number of new objects created since the previous snapshot. That puts the types in which you’re interested at the top of the table.
The Heap View Snapshot (see Figure 8) has two main sections: the Object Type table in the top pane and the References Graph in the lower pane. The Object Type table shows the name, number and size of each object type when the snapshot was taken.
Figure 8 The Heap View Snapshot in Diff Mode
Several of the types in the Heap View are from the framework. If you have Just My Code enabled (the default), these are types that either reference types in your code or are referenced by types in your code. Using this view, you can recognize a type from our code near the top of the table—PhotoFilter.ImageItem.
In Figure 8, you can see the Count Diff column shows 137 new ImageItem objects created since the previous snapshot. The top five new object types all have the same number of new objects, so these are likely related.
Let’s look at the second pane, the References Graph. If you expect a type to have been cleaned up by the garbage collector, but it still shows up in the Types table, Paths to Root can help you track down what is holding the reference. Paths to Root is one of the two views in the References graph. Paths to Root is a bottom-up tree showing the complete graph of types rooting your selected type. An object is rooted if another live object is holding a reference. Unnecessary rooted objects are often the cause of memory leaks in managed code.
Referenced Types, the other view, is the opposite. For the type selected in the Object Type table, this view shows which other types your selected type is referencing. This information can be useful in determining why objects of the selected type are holding on to more memory than expected. This is useful in the current investigation because the types may be using more memory than expected, but they aren’t outliving their usefulness.
Select the PhotoFilter.ImageItem row in the Object Type table. The References graph will update to show the graph for ImageItem. In the Referenced Types view, you can see the ImageItem objects are retaining a total of 280 String objects, and 140 each of three framework types: StorageFile, StorageItemThumbnail and BitmapImage.
Total Size makes it look as if String objects are contributing the most to the increase in memory retained by ImageItem objects. Focusing on the Total Size Diff column makes a lot of sense, but that number alone won’t lead you to the root cause. Some framework types, like BitmapImage, only have a small amount of total memory held on the managed heap. The number of BitmapImage instances is a more telling clue. Remember the list of thumbnails in PhotoFilter is virtualized, so it should load these images on demand and make them available for garbage collection when it’s done. However, it looks as if all thumbnails are loading in advance. Combined with what you now know about BitmapImage objects being icebergs, continue the investigation focused on those.
Right-click on PhotoFilter.ImageItem in the References Graph, and select Go To Definition to open the source file for ImageItem in the editor. ImageItem defines a member field, m_photo, that’s a BitmapImage, as shown in Figure 9.
Figure 9 Code Referencing the m_photo Member Field
The first code path that references m_photo is the get method of the Photo property, which is databound to the thumbnail ListView in the UI. It looks like the BitmapImage is being loaded (and thus decoded on the native heap) on demand.
The second code path referencing m_photo is the function LoadImageFromDisk. This item is on the app’s startup path. It gets called for each image being displayed, as the app starts up. This effectively preloads all the BitmapImage objects. This behavior works against the virtualized ListView, as all the memory has already been allocated, whether or not the ListView is displaying the image thumbnail.This pre-loading algorithm won’t scale well. The more images you have in your Pictures library, the higher the startup memory costs. On-demand loading of the BitmapImage objects is the more scalable solution.
After stopping the debugger, comment out the lines 81 and 82 in LoadImageFromDisk that load the BitmapImage instance. To Verify you’ve fixed the memory performance issue without breaking the functionality of the app, rerun the same experiment.
Press F5, and you’ll see on the graph total memory usage is now only 26.7MB (see Figure 10). Take another set of snapshots before and immediately after the call to LoadImagesFromDisk and then diff them. You’ll see there are still 137 ImageItem objects, but no BitmapImages (see Figure 11). The BitmapImages will load on-demand once you let the app to continue startup.
Figure 10 The Memory Graph After Fixing the References Issue
Figure 11 The References Graph After Fixing the Memory Issue
As mentioned earlier, this debugger-integrated tool also supports taking snapshots of the native heap, or both the managed and native heaps simultaneously. The heap you profile is based on which debugger you’re using:
- The managed-only debugger only takes snapshots of the managed heap.
- The native-only debugger (the default for native projects) only takes snapshots of the native heap.
- The mixed-mode debugger takes snapshots of both the managed and native heap.
You can adjust this setting on the Debug page of your project’s properties.
When to Run the Tools Without Debugging
It’s important to mention there’s extra overhead introduced when you measure performance with the Debugger. The main class of overhead comes from the fact that you’re usually running a Debug build of the app. The app you release to users will be a Release build.
In a Debug build, the compiler keeps the executable as close to the original source code as possible in terms of structure and behavior. Everything should work as you’d expect while debugging. On the other hand, a Release build tries to optimize code for performance in ways that degrade the debugging experience. Some examples include in-lining function calls and constant variables, removing unused variables and code paths, and storing variable information in ways that may not be readable by the Debugger.
All of this means CPU-intensive code can sometimes run significantly slower in Debug builds. Non-CPU-intensive operations such as disk I/O and network calls will take the same amount of time. There’s usually not a large difference in memory behavior, meaning leaked memory will leak and inefficient memory use will still show as a large increase in both cases.
The other overhead class to consider is added by the Debugger when it’s attached to the target app. The Debugger intercepts events such as module loads and exceptions. It also does other work required to let you set breakpoints and steps. Visual Studio does its best to filter this type of overhead out of the performance tools, but there will still be a small amount of overhead.
If you see an issue in the Release build of an app, it will almost always reproduce in the Debug build, but not necessarily the other way around. For this reason, the Debugger-integrated tools are meant to help you proactively discover performance issues during development. If you find an issue in a Debug build, you can flip to a Release build to see if the issue affects the Release build as well. However, you may decide to go ahead and fix the issue in the Debug build if you decide it’s good preventative performance work (that is, fixing the issue reduces the chances of encountering a performance issue later), if you determine the issue is non-CPU intensive (disk or network I/O), or if you would like to speed up the Debug build so your app is snappier during development.
When a performance issue is reported in a Release build, you want to ensure you can reproduce and verify you’ve fixed the problem. The best way to do this is flip your build to Release mode and run the tools without the Debugger in an environment that closely matches the reported issue.
If you’re trying to measure the operational duration, the Debugger-integrated tools will only be accurate to within tens of milliseconds due to the small amount of overhead. If you need a higher level of accuracy, running the tools without the Debugger is a better choice.
Wrapping Up
You can try out the new Diagnostic Tools debugger window in Visual Studio 2015 by downloading the Visual Studio 2015 RC. Using these new integrated debugging tools can help you improve performance as you debug your apps.
Dan Taylor is a program manager on the Visual Studio Diagnostics team and has been working on profiling and diagnostic tools for the past two years. Before joining the Visual Studio Diagnostics team, Taylor was a program manager on the .NET Framework team and contributed to numerous performance improvements to the .NET Framework and the CLR.
Charles Willis has been a program manager on the Visual Studio Diagnostics team since joining Microsoft last year.
Thanks to the following Microsoft technical experts for reviewing this article: Andrew Hall, Daniel Moth and Angelos Petropoulos
Angelos Petropoulos is a Senior Program Manager on the Visual Studio team. After getting his Master's in Object-Oriented Software Engineering he worked as an IT consultant in the UK. After moving to the US he joined the diagnostic tools team in Visual Studio and he is now the Program Manager for IntelliTrace
Daniel Moth is a Principal Program Manager on the Visual Studio team at Microsoft.