Remote Tools Profiler (Compact 2013)

3/26/2014

This tool collects information that you can use to assess the performance of an OS, drivers, and applications on a device.

How to Use

To use the Remote Tools Profiler

  1. Connect to the device. For more information, see Connecting to a Target Device.

  2. To start the profiler, do one of the following:

    • In Platform Builder, on the Tools menu, point to Remote Tools, and then click Profiler.

    • At a command prompt, type the following command.

      RemoteToolsShell ceprofiler.cetool <optional path of log file>
      
  3. In the Remote Tools Shell dialog box, do any of the following, and then click Start to start profiling:

    Use this

    To do this

    Collection Mode

    Select from the following modes:

    Instrumented / Call-attributed profiling

    In this mode, the profiler can capture every function entry and exit in your instrumented modules. It can tell you the exact timing and total number of calls, and show complete call trees. This mode generates rich data.

    However, because this profiler mode records two data points for every function call, it can create resource overhead. Do not use this profiler mode if you cannot rebuild the modules that you want to measure.

    Application-level sampling with callstacks

    Sets up a high-priority thread that wakes up periodically and calls GetThreadCallStack on whatever it preempted.

    Kernel-level sampling (Monte Carlo) with single stack frames

    This mode takes profiler interrupts in the kernel and samples them.

    Sampling Options

    Choose your option values, or do nothing to accept the default settings.

    Single Stack Frame

    Determines how much call stack information the profiler collects.

    • When this option is selected, the top frame of the call stack is collected. This frame belongs to the thread that was running when the profiler sample was collected.
      This method requires less overhead than collecting the whole call stack. This method uses interrupts to record the program counter. It collects samples in intervals as small as several hundred microseconds.
    • When this method is not selected, the whole call stack of the executing thread is collected.
      This method requires more overhead than collecting a single stack frame. This method uses a high-priority thread that records the call stack at set intervals. It uses GetThreadCallStack, to collect samples in intervals as small as 1 millisecond.

    For more information about the effects of this setting on the sampling interval, see Sampling Interval in this table.

    Include System Stack Frames

    Enabled by default.

    Determines whether the profiler includes or excludes stack frames that are not part of the running application.

    • When this option is selected, the profiler includes the whole call stack of the thread.
    • When this option is not selected, the profiler only includes the part of the stack that pertains to the process that owns the thread.
      This mode does not record the stack frames of system functions that are called from outside the running application.

    Flush Thread Prio

    Specifies the priority of the device-side thread that streams data from the device to the development computer.

    The default priority is 248. This default priority is equal to THREAD_PRIORITY_TIME_CRITICAL. For more information, see SetThreadPriority.

    Sampling Interval

    Specifies the duration of the interval between capturing profiler samples. The default interval varies based on the Single Stack Frame choice because of overhead when collecting more than a single stack frame. For default values in relation to the Single Stack Frame, see Single Stack Frame in this table.

    Profiler Thread Prio

    Specifies the priority of the device-side thread that captures profiling samples.

    The default priority is 247, the lowest real-time thread priority.

    This setting is ignored when Single Stack Frame is selected, because a thread is not used to profile.

    RAM Buffer Size

    Specifies the size of the device-side RAM buffer that is used to temporarily hold data before it is streamed to the development computer.

    The default size is 1 MB. The minimum is 8 KB and the maximum is 1 GB.

    Gg156057.note(en-us,WinEmbedded.80).gifNote:
    If the buffer size is set too small, it could lead to buffer overflows and data loss. Some profiler samples might not be recorded.

    Single Stack Frame Selected

    Default is 200 microseconds. Minimum and maximum are determined by OEM adaptation layer (OAL) settings. For more information, see Implementing Profiler Timer Interrupts in the OAL.

    Single Stack Frame not Selected

    Default is 5000 microseconds. Minimum is 1000 microseconds. There is no maximum.

To export Remote Tools Profiler results in .csv or XML format

  1. In the left pane of the profiler, select a profiler node.

  2. From the File menu, choose Export Profiler Results.

  3. Select the reports that you want to export.

    Each report will be exported as a separate file to the target directory.

  4. In the Export Report dialog box, in the Prefix reports with box, enter a prefix to add onto each file name for recordkeeping purposes.

  5. In the Exported report location box, enter the path to the target directory where the exported reports will be located.

  6. In the Exported report format box, choose either .csv or XML format.

  7. Click Export.

To check for data loss

  1. To determine whether data was lost, examine the profiler logs in the Marks view, which displays periodic data loss counters.

    1. To access this view, in the Remote Tools Shell window, click the Marks button.
      • If no data was lost, these counters display zero.
      • If data was lost, these counters display nonzero values.
  2. To determine whether data was lost on the device, inspect the Data Lost (KB) counter in the Collect New Data view of your profiler data.

    • If no data was lost, the counter displays zero.
    • If data was lost, the counter displays a nonzero value.

To add instrumentation using the compiler

  1. Note

    You can add instrumentation to the modules that you are interested in by using the compiler or the Platform Builder Build tool. This procedure describes the compiler method. For more information see "Collecting Call-Attributed Data" in the Remarks section.

    To compile your code for profiling, do one of the following:

    - On x86 microprocessors, add instrumentation to the application with CallCAP probes by compiling with the `/Gh` compiler option.  
    - On other microprocessors, depending on the type of profiling probe that you want to use, use one of the following options:  
        - To add instrumentation to the application by using FastCAP probes, use the `/fastcap` compiler option.  
        - To add instrumentation to the application by using CallCAP probes, use the `/callcap` compiler option.  
      For more information about the two types of profiling probes, see "Characteristics of FastCAP and CallCAP Probes" in the Remarks section later in this topic.  
    

    To provide the most accurate profiling data, we recommend that you set the optimization options that you will use in the application build you plan to release.

  2. To enable the profiler to properly resolve function names, you must explicitly set the name of the program database (.pdb) file by using the /Fd compiler option.

To add instrumentation using the Platform Builder Build tool

  1. Note

    You can add instrumentation to the modules that you are interested in by using the compiler or the Platform Builder Build tool. This procedure describes the Build tool method. For more information see "Collecting Call-Attributed Data" in the Remarks section.

    To open a Command Prompt window, click the Build menu, and then click Open Release Directory in Build Window.

  2. To add instrumentation to the application by using CallCAP probes, type the following command at the command prompt.

    set WINCECALLCAP=1 
    

    - or -

    To add instrumentation to the application by using FastCAP probes, type the following command at the command prompt.

    set WINCEFASTCAP=1
    

    Note

    FastCAP probes are not available on x86 microprocessors.

    For more information about the two types of profiling probes, see "Characteristics of FastCAP and CallCAP Probes" in the Remarks section later in this topic.

  3. At the command prompt, type: set WINCEREL=1

    If you set the WINCEREL environment variable, the Build tool copies the built application to the release directory.

  4. At the command prompt, type: set RELEASETYPE=local

  5. Navigate to the directory that contains the sources file for the application that you want to add instrumentation to.

  6. At the command prompt, type: build -c

    At the end of the build process, the Command Prompt window displays a "BUILD: Done" message, followed by a message that reports the number of compiled files.

    The file name and file name extension of the built application are determined by the values that you specified in the sources file in the directory that contains the application source code.

To collect call-attributed data

  1. To collect call-attributed data, have your instrumented modules ready for profiling and make sure that the Remote Tools Profiler is open in the Remote Tools Shell.

  2. In the left pane of the Remote Tools Shell, select the Collect New Data view.

  3. In the right pane, select Instrumented / Call-Attributed profiling.

  4. To start profiling, click Start.

  5. Perform the device scenario that you want to collect data about.

  6. To stop profiling, click Stop.

    The profiler stops and displays the results in the Analyze Collected Data view.

Remarks

The Remote Tools Profiler is a plug-in based on the Remote Tools Framework and it runs in the Remote Tools Shell.

Data loss can occur if the profiler collects data on the device side faster than the data can be transported to the development computer side. This situation should be rare, but it might occur if the data transport is very slow.

It can also occur during instrumented, or call-attributed, profiling if a thread with greater priority than the data flush thread logs a large amount of data in a very short time.

  • Data lost during a sampled profiling session can distort the profiler results.
    If a small percentage of the data is lost, the profiler results will still be fairly accurate.
    If a large percentage of the data is lost, the results might be very different from the actual behavior of the system being profiled.
  • Even a small amount of lost data during an instrumented, or call-attributed, profiling session can catastrophically alter the remaining data.
    Losing parts of the function call trace results in inaccurate call trees. In this case, the call tree might be missing some functions that were called, or it displays some functions as calling other functions that they never actually call.
    If your instrumented, or call-attributed, profiler log file is missing any data, you may not be able to trust any of the results in the log.

To avoid data loss, raise the profiler flush thread priority so that it runs at greater priority than the system threads that are logging data.

Collecting Call-Attributed Data

You can collect call-attributed data by using the Remote Tools Profiler by adding instrumentation to the modules that you are interested in. You do not need to add instrumentation to all modules to collect data from profiling probes.

Note

The profiler does not collect data for fibers. When a thread converts to a fiber, the profiler stops data collection for the thread. The profiler continues to collect data for other threads.

The profiler uses the QueryPerformanceCounter function from the board support package (BSP) to obtain time stamp information. If the time stamp information that the BSP provides is not detailed enough, performance timing might not be accurate, especially for an application that runs for a short period of time.

The profiler supports the following combinations of instrumented and non-instrumented modules:

  • Instrumented .exe file and instrumented DLL files.
  • Instrumented .exe file and non-instrumented DLL files.
  • Non-instrumented .exe file and instrumented DLL files.

Note

Symbol information for a module that is dynamically loaded and unloaded in a process during a profiler run can become unavailable when another module is mapped into the same address space in the process.

There are two methods that you can use to insert profiling probes and add data collection control functions to your code.

  • Use the compiler that ships with Windows Embedded Compact to add instrumentation to the module.
  • In Platform Builder, set an environment variable and then use the Platform Builder Build tool to add instrumentation to the module.

For more information about the two types of profiling probes, see "Characteristics of FastCAP and CallCAP Probes" below.

Characteristics of FastCAP and CallCAP Probes

The Remote Tools Profiler supports FastCAP and CallCAP probes, which you can use to add instrumentation to your code.

  • You insert a FastCAP probe immediately before each function call and immediately after each function return.

  • You insert a CallCAP probe immediately after each function call and immediately before each function return.

    Note

    FastCAP instrumentation is not supported on x86 microprocessors.

The following table displays a comparison of the two kinds of probes.

FastCAP Probes

CallCAP Probes

set WINCEFASTCAP=1

set WINCECALLCAP=1

Unsupported on x86 microprocessors.

No CPU restrictions.

You insert a probe immediately before each function call and immediately after each function return.

You insert a probe immediately after each function call and immediately before each function return.

The entry call from a non-instrumented module to an instrumented module is not captured.

The entry call from a non-instrumented module to an instrumented module is captured.

Does not capture a call or callback from a function pointer.

If a function call requires the data for the function to be paged in from the hard disk, probes attribute the time taken for the associated page fault to the function making the call.

Captures a call to a non-instrumented module.

Does not capture a call to a non-instrumented module.

Large instrumented executable, because probes are inserted at every instance of a profiled function.

Minimal effect on the size of the instrumented executable, because probes are in the function definition.

See Also

Other Resources

Remote Tools