Background

Home entertainment systems, broadcasting services and networked information services are increasingly reliant upon digital storage and communications media. While such media continue to increase in capacity and transfer rate, there remains a need to optimise the usage of such capacity in the light of the service requirements of the system. Multichannel surround sound is becoming an important element in modern consumer entertainment systems, but it places considerably greater demands on storage capacity and data bandwidth than conventional stereo sound. Examples of recent products involving multichannel or 3D audio are the DVD-Audio and Super Audio CD delivery media, virtual home theatre (VHT) systems involving 3D sound algorithms, consumer multichannel sound systems and virtual reality displays. This is a major growth area in consumer and professional audio at the present time. Now that audio systems are capable of reproducing three-dimensional sound fields, surrounding and immersing the listener (in contrast to more conventional monophonic and two-channel stereophonic modes which restrict reproduction to a limited spatial field), new challenges face the designer and assessor of products and programmes.

For broadcasters using digital multiplexes, demands for improved audio features (e.g. improved sound quality or surround sound provision) have to be weighed against demands for an increased number of services. Similarly, in future consumer systems involving audio-on-demand, or home entertainment networks, a certain fixed data bandwidth must be shared between information types (including sound, picture and data services). Some form of algorithmic or human arbitration is almost certainly required to determine what capacity or bandwidth is allocated to each service, leading to a need for information about the relative subjective importance of aspects of media quality. There is also a requirement for 'graceful degradation' of service elements in the most subjectively benign fashion when managing information capacity. In the design of consumer entertainment systems the engineer is faced with numerous potential 'quality trade-offs' that will affect the way in which the product or service is perceived by the consumer, and have a direct impact on the cost of manufacture. These trade-offs are almost certainly context and content dependent.

Recently standardised low-bit-rate coding structures for audio and video such as MPEG-4 (Kunz & Brandenburg 1999) include options for scalable encoding, whereby various quality levels and numbers of sound channels can be encoded in a single bitstream and decisions made downstream concerning the appropriate quality level and decoder implementation. Whilst the theoretical structures are in place for such standards there are many issues of implementation still to be addressed. There is currently little experimental subjective data on which to base decisions about scalability in multichannel audio codecs of this kind.

Unless commercial decisions involving the above issues are to be arbitrary and uninformed, there is a pressing need for more detailed research into the subjective effects of service quality trade-offs. In the broader context this requires detailed studies of multi-modal interaction between information types. Many such studies are designed to assess the effects of quality interactions between sound and vision in terms of factors such as synchronisation and the effect of sound quality on perceived picture quality. In the more specific context of this application, the concern is primarily to investigate the effects of controlled degradations in sound quality on the subjective quality ratings given by subjects, and to determine how multichannel sound reproduction may be 'scaled' or intentionally reduced in quality with minimum overall subjective effect for maximum saving in service bandwidth or product/system cost.