Improving performance in Hilo (Windows Store apps using C++ and XAML)
From: Developing an end-to-end Windows Store app using C++ and XAML: Hilo
The Hilo C++ team spent time learning what works and what doesn't when building a fast and fluid app. We identified areas in the app that we needed to improve perceived performance and where we had to improve the actual performance. Here are some tips and coding guidelines for creating a well-performing, responsive app.
Download
After you download the code, see Getting started with Hilo for instructions.
You will learn
- The differences between performance and perceived performance.
- Recommended strategies when profiling an app.
- Tips that help create a fast and fluid app.
- How to keep the app's UI responsive by running compute-intensive operations on a background thread.
Applies to
- Windows Runtime for Windows 8
- Visual C++ component extensions (C++/CX)
- XAML
Improving performance with app profiling
Users have a number of expectations for apps. They want immediate responses to touch, clicks, gestures, and key-presses. They expect animations to be smooth and fluid, and that users will never have to wait for the app to catch up with them.
Performance problems show up in various ways. They can reduce battery life, cause panning and scrolling to lag behind the user’s finger, or even make the app appear unresponsive for a period of time. One technique for determining where code optimizations have the greatest effect in reducing performance problems is to perform app profiling.
The profiling tools for Windows Store apps let you measure, evaluate, and target performance-related issues in your code. The profiler collects timing info for apps by using a sampling method that collects CPU call stack info at regular intervals. Profiling reports display info about the performance of your app and help you navigate through the execution paths of your code and the execution cost of your functions so that you can find the best opportunities for optimization. For more info see How to profile Visual C++, Visual C#, and Visual Basic code in Windows Store apps on a local machine. To see how to analyze the data returned from the profiler see Analyzing performance data for Visual C++, Visual C#, and Visual Basic code in Windows Store apps.
Optimizing performance is more than just implementing efficient algorithms. Performance can also be thought of as the user’s perception of app performance, while they use it. The user’s app experience can be separated into three categories – perception, tolerance, and responsiveness.
- Perception. A user’s perception of performance can be defined as how favorably they recall the time it took to perform their tasks within the app. This perception doesn't always match reality. Perceived performance can be improved by reducing the amount of time between activities that the user needs to perform to accomplish their task in an app.
- Tolerance. A user’s tolerance for delay depends on how long the user expects an operation to take. For example, a user might find cropping an image intolerable if the app becomes unresponsive during the cropping process, even for a few seconds. A user’s tolerance for delay can be increased by identifying areas of your app that require substantial processing time, and limiting or eliminating user uncertainty during those scenarios by providing a visual indication of progress. In addition, async APIs can be used to avoid blocking the UI thread and making the app appear frozen.
- Responsiveness. Responsiveness of an app is relative to the activity being performed. To measure and rate the performance of an activity, there must be a time interval to compare it against. The Hilo team used the heuristic that if an activity takes longer than 500ms, the app might need to provide feedback to the user in the form of a visual indication of progress.
Profiling tips
When profiling your app, follow these tips to ensure that reliable and repeatable performance measurements are taken:
- Windows 8 runs on a wide variety of devices, and taking performance measurements on one hardware item won't always show the performance characteristics of other form factors.
- Make sure the machine that is capturing performance measurements is plugged in, rather than running from a battery. Many systems conserve power when running from a battery, and so operate differently.
- Make sure that the total memory utilization on the system is less than 50%. If it’s higher, close apps until you reach 50% to make sure that you're measuring the impact of your app, rather than other processes.
- When remotely profiling an app, it’s recommended that you interact with your app directly on the remote device. While you can interact with your app via Remote Desktop Connection, it can significantly alter the performance of your app and the performance data that you collect. For more info, see How to profile Visual C++, Visual C#, and Visual Basic code in Windows Store apps on a remote machine.
- To collect the most accurate performance results, profile a Release build of your app. See How to: Set Debug and Release Configurations.
- Avoid profiling your app in the simulator because the simulator can distort the performance of your app.
Other performance tools
In addition to using profiling tools to measure app performance, the Hilo team also used the Windows Reliability and Performance Monitor (perfmon). Perfmon can be used to examine how programs you run affect your computer’s performance, both in real time and by collecting log data for later analysis. The Hilo team used this tool for a general diagnosis of the app’s performance. For more info about perfmon, see Windows Reliability and Performance Monitor.
[Top]
Performance tips
The Hilo team spent time learning what works and what doesn't when building a fast and fluid app. Here are some points to remember.
- Keep the launch times of your app fast
- Emphasize responsiveness in your apps by using asynchronous API calls on the UI thread
- Use thumbnails for quick rendering
- Prefetch thumbnails
- Trim resource dictionaries
- Optimize the element count
- Use independent animations
- Use parallel patterns for heavy computations
- Use techniques that minimize marshaling costs
- Keep your app’s memory usage low when suspended
- Minimize the amount of resources your app uses by breaking down intensive processing into smaller operations
Keep the launch times of your app fast
Defer loading large in-memory objects while the app is activating. If you have large tasks to complete, provide a custom splash screen so that your app can accomplish these tasks in the background.
Emphasize responsiveness in your apps by using asynchronous API calls on the UI thread
Don’t block the UI thread with synchronous APIs. Instead, use asynchronous APIs or call synchronous APIs in a non-blocking context. In addition, intensive processing operations should be moved to a thread pool thread. This is important because users will most likely notice delays longer than 100ms. Intensive processing operations should be broken down into a series of smaller operations, allowing the UI thread to listen for user input in-between.
Use thumbnails for quick rendering
The file system and media files are an important part of most apps, and also one of the most common sources of performance issues. File access is traditionally a key performance bottleneck for apps that display gallery views of files, such as photo albums. Accessing images can be slow, because it takes memory and CPU cycles to store, decode, and display the image.
Instead of scaling a full size image to display as a thumbnail, use the Windows Runtime thumbnail APIs. The Windows Runtime provides a set of APIs backed by an efficient cache that allows the app to quickly get a smaller version of an image to use for a thumbnail.
Prefetch thumbnails
As well as providing APIs for retrieving thumbnails, the Windows Runtime also includes a SetThumbnailPrefetch method in its API. This method specifies the thumbnail to retrieve for each file or folder based on the purpose of the thumbnail, its requested size, and the desired behavior to use to retrieve the thumbnail image.
In Hilo, the FileSystemRepository class queries the file system for photos that meet a specific date criteria, and returns any photos that meet that criteria. The CreateFileQuery method uses the SetThumbnailPrefetch method to return thumbnails for the files in the query result set.
FileSystemRepository.cpp
inline StorageFileQueryResult^ FileSystemRepository::CreateFileQuery(IStorageFolderQueryOperations^ folder, String^ query, IndexerOption indexerOption)
{
auto fileTypeFilter = ref new Vector<String^>(items);
auto queryOptions = ref new QueryOptions(CommonFileQuery::OrderByDate, fileTypeFilter);
queryOptions->FolderDepth = FolderDepth::Deep;
queryOptions->IndexerOption = indexerOption;
queryOptions->ApplicationSearchFilter = query;
queryOptions->SetThumbnailPrefetch(ThumbnailMode::PicturesView, 190, ThumbnailOptions::UseCurrentScale);
queryOptions->Language = CalendarExtensions::ResolvedLanguage();
return folder->CreateFileQueryWithOptions(queryOptions);
}
In this case, the code prefetches thumbnails that display a preview of each photo, up to 190 pixels wide, and increases the requested thumbnail size based upon the pixels per inch (PPI) of the display. Using the SetThumbnailPrefetch method can result in improvements of 70% in the time taken to show a view of photos from the user’s Pictures.
Trim resource dictionaries
App-wide resources should be stored in the Application object to avoid duplication, but resources specific to single pages should be moved to the resource dictionary of the page.
Optimize the element count
The XAML framework is designed to display thousands of objects, but reducing the number of elements on a page will make your app render faster. You can reduce a page’s element count by avoiding unnecessary elements, and collapsing elements that aren't visible.
Use independent animations
An independent animation runs independently from the UI thread. Many of the animation types used in XAML are composed by a composition engine that runs on a separate thread, with the engine’s work being offloaded from the CPU to the GPU. Moving animation composition to a non-UI thread means that the animation won’t jitter or be blocked by the app working on the UI thread. Composing the animation on the GPU greatly improves performance, allowing animations to run at a smooth and consistent frame rate.
You don’t need additional markup to make your animations independent. The system determines when it's possible to compose the animation independently, but there are some limitations for independent animations. Here are some common problems.
- Animating the Height and Width properties of a UIElement results in a dependent animation because these properties require layout changes that can only be accomplished on the UI thread. To have a similar effect to animating Height or Width, you can animate the scale of the control instead.
- If you set the CacheMode property of an element to BitmapCache then all animations in the visual subtree are run dependently. The solution is to simply not animate cached content.
- The ProgressRing and ProgressBar controls have infinite animations that can continue running even if the control isn't visible on the page, which may prevent the CPU from going into low power or idle mode. Set the ProgressRing::IsActive and ProgressBar::IsIndeterminate properties to false when they aren’t being shown on the page.
Hilo uses the ObjectAnimationUsingKeyFrames type, which is an independent animation.
Use parallel patterns for heavy computations
If your app performs heavy computations, it's very likely that you need to use parallel programming techniques. There are a number of well-established patterns for effectively using multicore hardware. Parallel Programming with Microsoft Visual C++ is a resource for some of the most common patterns, with examples that use PPL and the Asynchronous Agents Library. See Concurrency Runtime for comprehensive documentation of the APIs, along with examples.
Hilo contains some compute-intensive operations for manipulating images. For these operations we used parallel programming techniques that take advantage of the computer's parallel processing hardware. See Adapting to async programming, Using parallel programming and background tasks and Async programming patterns in C++ in this guide for more info.
Be aware of the overhead for type conversion
In order to interact with Windows Runtime features you sometimes need to create data types from the Platform and Windows namespaces. In some cases, creating objects of these types incurs overhead for type conversion. Hilo performs type conversion at the ABI to minimize this overhead. See Writing modern C++ code in this guide for more info.
Use techniques that minimize marshaling costs
If your code communicates with languages other than C++ and XAML, you can incur costs for marshaling data across runtime environments. Hilo interacts only with C++ and XAML, so this wasn't a consideration for us. See Writing modern C++ code in this guide for more info.
Keep your app’s memory usage low when suspended
When your app resumes from suspension, it reappears nearly instantly. But when your app restarts from termination, it may take longer to appear. Therefore, preventing your app from being terminated when it’s suspended is a technique for managing the user’s perception and tolerance of app responsiveness. This can be accomplished by keeping your app’s memory usage low when suspended.
When your app begins the suspension process, it should free any large objects that can be easily rebuilt on resume. This helps to keep your app’s memory footprint low, and reduces the likelihood that the OS will terminate your app after suspension. For more info see Handling suspend, resume and activation in this guide.
Minimize the amount of resources your app uses by breaking down intensive processing into smaller operations
Windows has to accommodate the resource needs of all Windows Store apps by terminating suspended apps to allow other apps to run. A side effect of this is that if your app requests a large amount of memory, other apps might be terminated, even if the app then frees that memory soon after requesting it. Be a good citizen so that the user doesn’t begin to attribute any perceived latencies in the system to your app. You can do this by breaking down intensive processing operations into a series of smaller operations.
[Top]