Server-side performance queries

Článok
07/19/2022

Good rendering performance on the server is critical for stable frame rates and a good user experience. It's important to monitor performance characteristics on the server carefully and optimize where necessary. Performance data can be queried through dedicated API functions.

Most impactful for the rendering performance is the model input data. You can tweak the input data as described in Configuring the model conversion.

Client-side application performance might be a bottleneck, too. For an in-depth analysis of client-side performance, it's recommended to take a performance trace.

Client/server timeline

Before going into detail regarding the various latency values, it's worthwhile to have a look at the synchronization points between client and server on the timeline:

Pipeline timeline

The illustration shows how:

a Pose estimate is kicked off by the client at constant 60-Hz frame rate (every 16.6 ms)
the server then starts rendering, based on the pose
the server sends back the encoded video image
the client decodes the image, performs some CPU and GPU work on top of it and then displays the image

Frame statistics queries

Frame statistics provide some high-level information for the last frame, such as latency. The data provided in the FrameStatistics structure is measured on the client side, so the API is a synchronous call:

void QueryFrameData(RenderingSession session)
{
    FrameStatistics frameStatistics;
    if (session.GraphicsBinding.GetLastFrameStatistics(out frameStatistics) == Result.Success)
    {
        // do something with the result
    }
}

void QueryFrameData(ApiHandle<RenderingSession> session)
{
    FrameStatistics frameStatistics;
    if (session->GetGraphicsBinding()->GetLastFrameStatistics(&frameStatistics) == Result::Success)
    {
        // do something with the result
    }
}

The retrieved FrameStatistics object holds the following members:

Member	Explanation
LatencyPoseToReceive	Latency from camera pose estimation on the client device until a server frame for this pose is fully available to the client application. This value includes network roundtrip, server render time, video decode, and jitter compensation. See interval 1 in the illustration above.
LatencyReceiveToPresent	Latency from availability of a received remote frame until the client app calls PresentFrame on the CPU. See interval 2 in the illustration above.
LatencyPresentToDisplay	Latency from presenting a frame on the CPU until display lights up. This value includes client GPU time, any frame buffering performed by the OS, hardware reprojection, and device-dependent display scan-out time. See interval 3 in the illustration above.
TimeSinceLastPresent	The time between subsequent calls to PresentFrame on the CPU. Values greater than the display duration (for example 16.6 ms on a 60-Hz client device) indicate issues caused by the client application not finishing its CPU workload in time.
VideoFramesReceived	The number of frames received from the server in the last second.
VideoFrameReusedCount	Number of received frames in the last second that were used on the device more than once. Non-zero values indicate that frames had to be reused and reprojected either due to network jitter or excessive server rendering time.
VideoFramesSkipped	Number of received frames in the last second that were decoded, but not shown on display because a newer frame has arrived. Non-zero values indicate that network jittering caused multiple frames to be delayed and then arrive on the client device together in a burst.
VideoFramesDiscarded	Very similar to VideoFramesSkipped, but the reason for being discarded is that a frame came in so late that it can't even be correlated with any pending pose anymore. If this discarding happens, there's some severe network contention.
VideoFrameMinDelta	Minimum amount of time between two consecutive frames arriving during the last second. Together with VideoFrameMaxDelta, this range gives an indication of jitter caused either by the network or video codec.
VideoFrameMaxDelta	Maximum amount of time between two consecutive frames arriving during the last second. Together with VideoFrameMinDelta, this range gives an indication of jitter caused either by the network or video codec.

The sum of all latency values is typically much larger than the available frame time at 60 Hz. This is OK, because multiple frames are in flight in parallel, and new frame requests are kicked off at the desired frame rate, as shown in the illustration. However if latency becomes too large, it affects the quality of the late stage reprojection, and may compromise the overall experience.

VideoFramesReceived, VideoFrameReusedCount, and VideoFramesDiscarded can be used to gauge network and server performance. A combination of a low VideoFramesReceived value and a high VideoFrameReusedCount value can indicate network congestion or poor server performance. A high VideoFramesDiscarded value also indicates network congestion.

Lastly,TimeSinceLastPresent, VideoFrameMinDelta, and VideoFrameMaxDelta give an idea of the variance of incoming video frames and local present calls. High variance means instable frame rate.

None of the values above gives clear indication of pure network latency (the red arrows in the illustration), because the exact time that the server is busy rendering needs to be subtracted from the roundtrip value LatencyPoseToReceive. The server-side portion of the overall latency is information that is unavailable to the client. However, the next paragraph explains how this value is approximated through extra input from the server and exposed through the NetworkLatency value.

Performance assessment queries

Performance assessment queries provide more in-depth information about the CPU and GPU workload on the server. Since the data is requested from the server, querying a performance snapshot follows the usual async pattern:

async void QueryPerformanceAssessment(RenderingSession session)
{
    try
    {
        PerformanceAssessment result = await session.Connection.QueryServerPerformanceAssessmentAsync();
        // do something with result...
    }
    catch (RRException ex)
    {
    }
}

void QueryPerformanceAssessment(ApiHandle<RenderingSession> session)
{
    session->Connection()->QueryServerPerformanceAssessmentAsync([](Status status, PerformanceAssessment result) {
        if (status == Status::OK)
        {
            // do something with result...
        }
    });
}

Contrary to the FrameStatistics object, the PerformanceAssessment object contains server-side information:

Member	Explanation
TimeCPU	Average server CPU time per frame in milliseconds
TimeGPU	Average server GPU time per frame in milliseconds
UtilizationCPU	Total server CPU utilization in percent
UtilizationGPU	Total server GPU utilization in percent
MemoryCPU	Total server main memory utilization in percent
MemoryGPU	Total dedicated video memory utilization in percent of the server GPU
NetworkLatency	The approximate average roundtrip network latency in milliseconds. In the illustration above, this value corresponds to the sum of the red arrows. The value is computed by subtracting actual server rendering time from the `LatencyPoseToReceive` value of `FrameStatistics`. While this approximation isn't accurate, it gives some indication of the network latency, isolated from the latency values computed on the client.
PolygonsRendered	The number of triangles rendered in one frame. This number also includes the triangles that are culled later during rendering. That means, this number doesn't vary a lot across different camera positions, but performance can vary drastically, depending on the triangle culling rate.
PointsRendered	The number of points in point clouds rendered in one frame. Same culling criteria as mentioned above for `PolygonsRendered` apply here.

To help you assess the values, each portion comes with a quality classification like Great, Good, Mediocre, or Bad. This assessment metric provides a rough indication of the server's health, but it shouldn't be seen as absolute. For example, assume you see a 'mediocre' score for the GPU time. It's considered mediocre because it gets close to the limit for the overall frame time budget. In your case however, it might be a good value nonetheless, because you're rendering a complex model.

Statistics debug output

The class ServiceStatistics is a C# class that wraps around both the frame statistics and performance assessment queries and provides convenient functionality to return statistics as aggregated values or as a pre-built string. The following code is the easiest way to show server-side statistics in your client application.

ServiceStatistics _stats = null;

void OnConnect()
{
    _stats = new ServiceStatistics();
}

void OnDisconnect()
{
    _stats = null;
}

void Update()
{
    if (_stats != null)
    {
        // update once a frame to retrieve new information and build average values
        _stats.Update(Service.CurrentActiveSession);

        // retrieve a string with relevant stats information
        InfoLabel.text = _stats.GetStatsString();
    }
}

The code above populates the text label with the following text:

ArrServiceStats string output

The GetStatsString API formats a string of all the values, but each single value can also be queried programmatically from the ServiceStatistics instance.

There are also variants of the members, which aggregate the values over time. See members with suffix *Avg, *Max, or *Total. The member FramesUsedForAverage indicates how many frames have been used for this aggregation.

Zdieľať cez

Server-side performance queries

Client/server timeline

Frame statistics queries

Performance assessment queries

Statistics debug output

API documentation

Next steps

Pripomienky

Ďalšie zdroje informácií