Research in Immersive Acoustics – B. JUANG – Georgia Tech

Immersive Acoustics – Multi-channel Signal Processing for Next-generation Teleconferencing & Communication

Two important advances took place in telecommunications in recent years. One is the ubiquity of packet networks and the other the exponential growth of data transfer rate afforded by fiber optic broadband networks. While the widely popular Internet has taken advantage of these advances, the full potential of broadband packet networks is yet to be realized. As a matter of fact, these two advances offer communication engineers tremendous opportunities to revolutionize the traditional telephony into tele-collaboration networks to support multi-dimensional information sharing that makes full use of human capabilities in binaural hearing and binocular vision to maximize the joint productivity. Broadband packet networks bring about an information transport mechanism that does not have the rigid notion of a traditional voice circuit and are ready to support dynamically allocated multi-channel tele-collaboration applications.

The challenge in remote multi-dimensional information sharing is an advanced teleconferencing environment which allows reconstruction of the far-end acoustic and visual scene at the near-end so that conferencing participants are able to maintain the sense of interaction, keeping track of who is speaking and what has been said and done, as if all the collaborators were in the same room. The recent success in developing a stereophonic echo cancellation algorithm for hands-free teleconferencing indicates that indeed spatialization of sound (and immersive acoustics for a complete acoustic environment) is imperative in achieving a much enhanced conferencing experience and productivity. Building upon on the past success, our current research aims at generalizing the previous result to multi-channel, multi-party communications, beyond an elementary point-to-point scenario, with an additional challenge in the networking area in order to ensure a high level of quality of service.

This project is organized to address the technical issues around multi-channel signal processing and communication for tele-collaboration, with emphasis in multi-source (e.g., multiple talkers) and multi-channel (e.g., multiple microphone input) information processing for echo control, source tracking, ambient interference suppression, and spatial sound reconstruction. It also extends the current advance in point-to-point stereophonic teleconferencing to a multi-party scenario, involving more than two participating conferencing sites. Technical innovations and merit in this research comprise four major components:

System and signal gain plan analysis related to immersive acoustics;
Generalization of multi-channel audio and acoustic signal processing based on the multi-input-multi-output (MIMO) system formulation;
Incorporation of source separation and tracking algorithms and introduction of “sound objects” in multi-channel echo cancellation and acoustic experience reconstruction for conferencing; and
Multi-party communication protocol design for integration with object-based acoustics, to enhance the visual effects and to ensure quality of service for tele-collaboration.

Our research will lead to next-generation teleconferencing and tele-collaboration systems, bringing broad impacts on many fronts. The new set of technologies has the potential to shift the paradigm of telecommunications from the traditional telephony to a new mode of communication involving multi-dimensional information sharing that requires high quality sound, acoustics and visual effects to work with
the natural human capability in binaural hearing and stereo vision. Figure 1 depicts such a paradigm change, from the traditional telephony to a multi-channel, multimodal collaborative conferencing scenario. Used in education, multi-channel information sharing is not only beneficial but imperative for distance learning to be effective. Multi-channel signal processing that enables multi-phonic acoustic echo cancellation and sound spatialization for hands-free teleconferencing will boost the collaboration productivity tremendously among conferencing participants. With the recent growth in the use of teleconferencing, which is at times considered the next killer application for broadband packet networks, the new set of technologies will help materialize or further drive the broadband revolution by providing truly beneficial applications to the user. Also in light of the recent security concerns, this new set of technologies will provide a sensible alternative to travel without the issue of compromising productivity.

Fig. 1 A multi-channel
network for multimedia, multi-modal collaboration and Interactions

A number of technical challenges need to be tackled to realize this vision of multi-dimensional, multi-modal information sharing. These include:

A stereophonic acoustic echo control and cancellation algorithm that achieves reasonable reduction in acoustic echo (15-20dB echo return loss) to support stereo teleconferencing with spatialized audio output. (A real time demonstration is available.)
Generalization of stereophonic teleconferencing to multi-channel teleconferencing for 3-D effects.
Synthetic stereophonic and multi-phonic reconstruction of room acoustics to support multi-party communications.
Real-time multi-camera image and video capturing, reconstruction and synthesis.
Multi-channel source localization and talker tracking. (A single-source localization and tracking system demonstration is available.)
Multi-channel source separation. (A demonstration of a 2-channel case under a benign condition is available.)
Multi-sensor, multi-channel acoustic field modeling and reconstruction.
Wideband speech (50-7000Hz) and audio (audible spectrum) coding.
Integration with packet networks.