Mean Opinion Score and Metrics

Artigo
03/23/2012

Tópico modificado em: 2009-03-04

Monitoring Server reports several different measures of voice quality to monitor the Quality of Experience (QoE) that is being delivered to end users. This section explains how voice quality is measured and the different scales that Monitoring Server uses.

The basis of all measures of voice quality is subjective testing: how a person perceives the quality of speech is affected by human perception, and so it is inherently subjective. There are several different methodologies for subjective testing. Most voice quality measures are based on an absolute categorization rating (ACR) scale.

In an ACR subjective test, a statistically significant number of people rate their QoE on a scale of 1 (bad) to 5 (excellent). The average of the scores is called a mean opinion score (MOS). The resulting MOS depends on the range of experiences that were exposed to the group and to the type of experience being rated. As a result, MOS values between tests cannot be compared unless the conditions are the same.

Because it is impractical to conduct subjective tests of voice quality for a live communication system, the Office Communications Server solution generates ACR MOS values by using advanced algorithms to objectively predict the results of a subjective test. Two classes of MOS values are used, listening quality MOS (MOS-LQ) and conversational quality MOS (MOS-CQ).

MOS-LQ is the most commonly used MOS value within the Voice over IP (VoIP) industry. It measures the quality of audio for listening purposes only. MOS-LQ does not take into account any bidirectional effects, such as delay and echo.

MOS-CQ takes into account listening quality in each direction, as well as the bidirectional effects.

Office Communications Server makes use of both narrowband (that is, 8 kHz sample rate) and wideband (that is, 16 kHz sample rate) audio codecs. In order to provide consistency when measuring the MOS-LQ, all of the MOS-LQ values are reported on wideband MOS-LQ scale instead of the traditional narrowband MOS-LQ scale that other systems provide.

The difference between the wideband MOS-LQ scale and narrowband MOS-LQ is the range of the experience played to the group of people who were in the subjective test. In the case of narrowband MOS-LQ, the group is exposed to speech where only narrowband codecs are used, and so the listeners lose any audio frequency content above 4 kHz. For wideband MOS-LQ, the group is exposed to speech where both narrowband and wideband codecs are used. Since listeners prefer the additional audio frequency content that can be represented in wideband audio, narrowband codecs will have a lower score on a wideband MOS-LQ scale than on a narrowband MOS-LQ scale. For example, G.711 is typically cited as having a narrowband MOS-LQ score of ~4.1 but when compared to wideband codecs on a wideband MOS-LQ scale, G.711 may have a score of only approximately 3.6.

Metrics Descriptions

The UC solution provides several different MOS values:

Listening MOS
Sending MOS
Network MOS
Conversational MOS

Listening MOS

Listening MOS is a prediction of the wideband MOS-LQ of the audio stream that is played to the user. This value takes into consideration the audio fidelity and distortion, and speech and noise levels. From this data, it predicts how a large group of users would rate the quality of the audio they hear.

The Listening MOS varies depending on:

The codec used.
A wideband or narrowband codec.
The characteristics of the audio capture device used by the person speaking (that is, the person sending the audio).
Any transcoding or mixing that occurred.
Defects from packet loss or packet loss concealment
The speech level and background noise of the person speaking (that is, the person sending the audio).

Due to the large number of factors that influence this value, it is most useful to view the Listening MOS statistically rather than by using a single call.

Sending MOS

Sending MOS is a prediction of the wideband MOS-LQ of the audio stream that is being sent from the user prior to being encoded and sent to the network. This value takes into consideration the speech and noise levels of the user along with any distortions, and from this data predicts how a large group of users would rate the audio quality they hear.

The Sending MOS varies depending on the:

The characteristics of the audio-capture device used by the person sending the audio.
The speech level and background noise of the person sending the audio (that is, the person who is speaking).

Due to the large number of factors that influence this value, it is most useful to view the Sending MOS statistically rather than by using a single value.

Network MOS

Network MOS is a prediction of the wideband MOS-LQ of audio that is played to the user. This value takes into consideration only network factors such as codec used, packet loss, packet reorder, packet errors, and jitter.

The difference between Network MOS and Listening MOS is that the Network MOS considers only the impact of the network on the listening quality, whereas Listening MOS also considers the payload (that is, speech level, noise level). This makes Network MOS useful for identifying network conditions impacting the audio quality being delivered.

For each codec, there is a maximum possible Network MOS that represents the best possible Listening Quality MOS under perfect network conditions. The following table shows the codec typically used in a scenario and the corresponding maximum Network MOS.

Table 1. Typical Codecs Used in Scenarios with Maximum Network MOS

Scenario	Codec	Max NMOS
UC-UC call	RTAudio WB	4.10
UC-UC call	RTAudio NB	2.95
Conference call	Siren	3.72
UC-PSTN call	RTAudio NB	2.95
UC-PSTN call	G-711	3.61

Because the maximum Network MOS varies depending on the scenario (that is, because different codecs are used), it is usually more interesting to look at the average degradation of the Network MOS during the call. The average degradation can be broken down into how much is due to network jitter and how much is due to packet loss. For very small degradations, the cause of the degradation may not be available.

Conversational MOS

Conversational MOS is a prediction of the narrowband MOS-CQ of the audio stream that is played to the user. This value takes into consideration the listening quality of the audio played and sent across the network, the speech and noise levels for both audio streams, and echoes. It represents how a large group of people would rate the quality of the connection for holding a conversation.

The Conversational MOS varies depending the same factors as Listening MOS, as well as the following:

Echo.
Network delay.
Delay due to jitter buffering.
Delay due to devices.

Due to the large number of factors that influence this value, it is most useful to view the Conversational MOS statistically rather than by using a single value.

Interpreting the MOS Metrics

The wide set of MOS and associated metrics provide a rich view into the QoE being delivered to the end users and can be used to identify a wide range of issues. The basic approach to using the MOS metrics to identify issues that affect quality is to compare the current MOS metrics either against previously known good states or against similar conditions. Combined with filtering on different locations, time periods, call types, etc, the root cause can be narrowed down to lead to further investigation using the detailed metrics or other troubleshooting tools.

The following are a few examples of issues and how they can be identified through analysis of the metrics.

LAN Congestion

As a local area network (LAN) becomes more congested with traffic, the rates of packet loss and amount of jitter increases for calls that pass through the LAN. This increase in packet loss and jitter is reflected in lower Network MOSs and higher average degradation for these calls. Using the QoE Trend reports, the lower Network MOSs can be seen for the past several weeks and can be used to identify the LAN that is exhibiting signs of congestion. The call list report for calls on that LAN will show higher degradation, jitter, and packet loss when compared to calls made before the LAN was congested or when compared to calls made on similar un-congested networks.

Bad Devices or Device Drivers

The audio quality for a call is affected by the microphone device and associated driver used to capture the audio from the person speaking. If a new device is used or a new driver for the device is deployed that results in lower audio quality capture, this is reflected in lower Sending MOS. Using a device report, you can compare the Sending MOS for these devices to other devices and against historical data to isolate a problematic device or device driver which can then be addressed. It is important to note that to identify problematic devices or drivers, they must be deployed and used enough to generate sufficient data for analysis. A single rarely-used problematic device will likely not be identifiable using this report.

Compartilhar via