A Multi-modal Embodied Interactive Desk

This research introduces a novel multi-modal and multi-user interface – the Magic Music Desk (MMD) which employs the principles of embodied interaction, and emphasizes social interaction between users. Embodied computing is a next generation computing paradigm that involves the elements of ubiquitous computing, tangible interfaces and interaction and social computing. It moves the computer interface away from the traditional keyboard and mouse and into the environment, supporting more interactive behavior.

The objective in this research is to develop new interaction modalities in the context of mixed reality. This work provides ubiquitous computing in a physical environment, tangible interaction (including tangible mixed reality), and emphasizes real-time social computing. A novel combination of multiple modalities for the interfaces have been developed using speech recognition, hand gesture recognition, sound and visual mixed reality (MR) technologies. A new mode of interaction called “What You Say is What You See” (WYSWYS) is implemented in this system, in addition to novel speech, music, visual, and sound interactions with an emphasis on social interaction between users. 


Figure 1: Music composition using the Magic Music Desk.



Figure 2: Visualizing speech (WYSWYS).

The WYSWYS modality is used to visualize speech as a 3D word flowing down to the desk from the speaker’s mouth. This allows a new type of social interaction between multi-users, even when they speak different languages. Thus, a multi-cultural social interaction, where words are understood as 3D visual objects can be realised. Additionally, this system allows the user to interact with the objects using both speech and hands. By these two modalities, users can move the virtual object around the desk or pick up and rotate it as if it is put on the user’s hand. Furthermore, in order to make the MR system more immersive and to enhance feelings of presence, 3D sound and music is applied according to the movement of augmented virtual objects. Not only is the human-computer interaction multi-modal, but the interaction between humans also is multi-modal and multi-cultural with this system. 

The implementation of this system is in the form of a Magic Music Desk. The system enables the user to import and control the virtual instruments or players simply by using speech commands. When different combination of the objects is imported on the desk, different music can be heard as if the music has been “composed” by arranging the instruments and players. The sound emanating from each virtual player is fully specialised in the 3D environment. Figure 1 shows two users enjoying the Magic Music Desk. When the virtual instrument is introduced onto the desk, it generates the 3D sound as if the sound is coming from the instrument at that point. During the whole procedure, speech commands are still available for the user to interact with the objects such as to move it, rotate it, zoom it or even delete it.

The speech command can be visualized by showing virtual words flowing from the user’s mouth. Furthermore even for multi-cultural interaction, the human to human communication can be natural. In Figure 2 we see the example of a user speaking in Mandarin. The word is converted to a 3D virtual character which floats from the user’s mouth, down to the table, and “splashes down” causing the object that was uttered to appear. This allows humans to experience an enjoyable cross-cultural and highly visual interaction.

Contact Person: Dr AD Cheok
Tel: 6874 6850
 Fax: 6779 1103
Email: eleadc@nus.edu.sg