Mean Opinion Score (MOS) is a widely adopted metric used to evaluate voice call quality in VoIP systems. Typically ranging from 1 (bad) to 5 (excellent), MOS provides a simple yet effective way to measure how end-users perceive voice communication quality.

Network MOS

Network MOS is calculated using Real-Time Control Protocol (RTCP) statistics collected during a call session, including jitter, packet loss, latency, and codec choice. It provides a quick estimation of quality based on network conditions.

Advantages:

  • Simple, lightweight implementation
  • Real-time quality estimates
  • Effective for continuous monitoring and rapid troubleshooting

Limitations:

  • Indirect approximation; does not measure actual perceived audio quality
  • Cannot detect audio-specific issues like echo or distortion
  • Requires RTCP to be enabled and supported end-to-end, which often isn’t the case
  • RTCP data may not be generated fully end-to-end when intermediate elements (e.g., PBXs, SBCs, transcoding devices) are involved

Audio MOS Using Reference Audio

Audio MOS involves direct comparison between original (reference) audio files and the degraded recordings captured from actual calls, providing a more accurate assessment of perceived voice quality.

Commonly used algorithms for this approach include PESQ, POLQA, and ViSQOL.

PESQ (Perceptual Evaluation of Speech Quality)

PESQ is a traditional method, widely adopted and effective in detecting distortions and packet loss. However, it is primarily optimized for narrowband and wideband codecs and can incur significant licensing costs.

POLQA (Perceptual Objective Listening Quality Analysis)

POLQA covers a broader range of codecs, including wideband and super-wideband, making it particularly suitable for modern voice communication standards like VoLTE. Its downsides include higher computational resource requirements and expensive licensing.

ViSQOL (Virtual Speech Quality Objective Listener)

ViSQOL is an open-source alternative preferred at Sipfront, particularly effective for evaluating modern wideband and super-wideband codecs.

Why ViSQOL is Preferred at Sipfront:

  • Open-source and cost-effective
  • High accuracy aligned with modern codecs
  • Efficient processing, suitable for real-time and automated monitoring
  • Actively maintained and improved by the community

How ViSQOL Evaluates Speech Quality:

ViSQOL compares a reference audio file (the input file we inject on one side of the call) to a potentially degraded recording (the output file we record at the other side of the call) using a spectro-temporal analysis. It measures similarity in time-frequency representations (spectrograms) between the original and received audio, assessing how closely the two signals match perceptually. By calculating similarity through a metric called the Neurogram Similarity Index Measure (NSIM), ViSQOL effectively captures common VoIP impairments such as jitter, packet loss concealment, and temporal warping. This method allows ViSQOL to closely approximate human subjective perception, especially for degradations introduced by packet-based voice transmission.

Conclusion

Selecting the appropriate MOS measurement method is crucial for ensuring accurate VoIP quality assessment. While Network MOS offers rapid insights based on network statistics, Audio MOS measured through actual recordings better reflects user perception. Sipfront favors ViSQOL due to its ideal combination of accuracy, efficiency, and cost-effectiveness, empowering businesses with precise, actionable insights to deliver optimal voice quality.

comments powered by Disqus