IoSR : QESTRAL

Project Background

The spatial quality of audio content delivery systems is becoming increasingly important as service providers attempt to deliver enhanced experiences of spatial immersion and naturalness in audio-visual applications. Examples are virtual reality, telepresence, home cinema, games and communications products. At the low end of the spatial quality range mobile and telecoms companies are increasingly interested in the spatial aspect of product sound quality. Here simple stereophony over two loudspeakers, or headphones connected to a PDA/mobile phone/MP3 player, is increasingly typical. Binaural spatial audio is soon to become a common feature in mobile devices. There is also substantial research at the high end of spatial audio content authoring, coding and delivery, incorporating MPEG-4 scene encoding and wavefield synthesis rendering techniques involving hundreds of loudspeakers. In the middle range, home cinema involving 5.1-channel surround sound is one of the largest growth areas in consumer electronics, bringing enhanced spatial sound quality into a large number of homes. Home computer systems are increasingly equipped with surround sound replay and recent multimedia players incorporate multichannel surround sound streaming capabilities, for example.

The research trend is increasingly towards separating the rendering format from the method of coding/representation. This suits scalable coding environments involving multiple data rate delivery mechanisms (e.g. digital broadcasting, internet, mobile comms) and enables spatial audio content to be authored once but replayed in many different forms. The range of spatial qualities that may be delivered to the listener will therefore be wide and severe compromises in spatial quality may be encountered, particularly under the most band-limited delivery conditions or with basic rendering devices. (Recent encoding standards such as MPEG-4, for example, incorporate scalable, parametric and scene-based coding modes for delivering spatial audio content over media with a wide range of data bandwidths, ranging from high-rate wired links and physical media to mobile and internet communications where delivery bandwidth is highly restricted.) Encoding and rendering processes can lead to spatial quality degradations including the following: changes in source-related attributes such as perceived location, width, distance and stability; changes in environment-related attributes such as envelopment, spaciousness and width. Under conditions of extreme restriction, major changes in spatial resolution or dimensionality may be experienced (e.g. when downmixing from many loudspeaker channels to two). These lead to a reduction in overall spatial fidelity. Recent experiments involving multivariate analysis of audio quality show that in home entertainment applications spatial quality accounts for a significant proportion of the overall quality (typically as much as 30%).

The important research question that arises in relation to future spatial audio delivery systems is how to evaluate the spatial audio quality. A possible answer is to do it by means of formal subjective tests, but these are time consuming and expensive. Therefore it would be beneficial to employ an algorithm that could predict spatial quality on the basis of measured comparisons between a reference reproduction and one that may have been impaired by coding or other forms of audio processing. Such a system has not been developed yet. The current model for evaluating perceived audio quality (ITU-R BS.1387) does not currently take into account the contribution of spatial quality to the overall user experience, concentrating on coding distortion, noise and bandwidth degradations. It only considers monophonic signal characteristics, involving a relatively simple weighting process to combine the results from left and right channels of a stereophonic signal. It does not allow for multichannel audio signals or more sophisticated spatial rendering formats where the differences in spatial quality between reference and impaired versions could be considerable, despite highly similar signal characteristics. The research community has begun to recognise this problem in evaluating spatial audio coding systems, and has become more interested in developing algorithms that evaluate spatial factors.