Enabling computers to understand human communicative behaviour requires interpreting speech, hand gesture, facial expression and other body language cues. In contrast to automatic speech recognition, comparatively less headway has been made in computer recognition of non-verbal communicative behaviour -- non-verbal gestures are highly variable across cultures and people. On the other hand, sign language (SL) communication is as precise and rich as spoken language and incorporates multiple elements of body language - hand gestures, facial expressions, head and body movements. In this respect, research in automatic SL recognition gives structure and a well-defined framework for computer understanding of non-verbal communication.

The main aim of our project is to develop a system that can understand SL sentences by recognizing hand gestures, body movements and facial expressions; integrate them together to capture complete meaning, and translate the recognized meaning into English text or speech. Figure 1 illustrates the experimental setup showing test subject wearing a pair of Cybergloves and tracker receivers mounted on both hands. The data acquired is multi-modal: Cybergloves capture finger configuration data, electromagnetic trackers capture hand, head and body movement data, and digital video cameras capture facial features and expressions. These multiple data streams need to be synchronized, and processed to recognize the meaning.


Figure 1: Experimental setup.

The hand gestures in SL consist of basic lexical signs, which are overlayed with grammatical inflections. Our work focuses on the oft neglected aspect of grammatical inflections which are expressed as systematic spatial and temporal variations in the movement of the hand gesture. This approach simultaneously recognizes both the basic meaning and inflections of hand gestures using the probabilistic framework of Bayesian Networks (BN). We model temporal and spatial movement aspects that exhibit systematic variation (such as movement size, orientation and speed profile), as channels of information with distinct classes. The BN models the dependencies between all the gesture information channels, and the basic and inflected meanings. This approach is efficient in the use of training data, and in managing the inherent complexity of information. Experiments on a synthetic vocabulary exhibiting the same type of systematic variations as in SL, has yielded good results. The system trained on 8 subjects gave 85.0% accuracy on test data from these same subjects. Experiments on signer adaptation - where a small amount of data from an unseen signer is used to adapt a trained system – showed 88.5% recognition accuracy (a 75.7% reduction in error as compared to unadapted model). We are working with members of the Deaf & Hard-of-Hearing Federation ( Singapore) to obtain experimental data from actual sign communication.

Non-manual signals including facial expressions, head and body movement convey grammatical information in SL. These signals can for example, change the meaning of a sentence from an assertion to a question or even a negation. We have so far experimented with recognizing six different types of facial expressions as used in SL and obtained 85-95% recognition accuracy . Figure 2 shows the three upper and three lower face expressions. These facial expressions can be more subtle than those displayed by hearing people, and often include simultaneous head movement to make this a challenging problem.

Figure 2: Six types of facial expressions used in SL.

The final step of this work will be to integrate results from recognition of basic lexical meaning with results from recognition of grammatical inflections and non-manual signals. Results from this work will be of significance to the development of computer-based translation and dialogue systems for communication between deaf and hearing people; video communication between the deaf by use of signing avatars; general non-keyboard-based interfaces to the computer; computer-based tutoring; and perceptual user-interfaces (smart rooms, wearable and ubiquitous computing).


Contact Person: Assoc Prof. Surendra Ranganath
Telephone: 68746538
Fascimile: 6779 1103
Email :
elesr@nus.edu.sg