Vision and Image Processing Research 

             Computer vision, as the name suggests, is the technology by which computers can interpret images.  Images, which can be part of a video sequence or scanned from still photographs, are essentially arrays of numbers.  For example, in a greyscale image, every location in the image is represented by an intensity value.  So, for a greyscale image of size 640x480, all the computer "sees" is a data set of size 640x480 bytes.  The task of a vision researcher is to design algorithms, or procedures so that certain scene semantics can be extracted from these numbers.  Image processing, on the other hand, implies image to mage transformation.  Examples of this include enhancement of the image quality, compression, restoration, etc.  Although very much related to computer vision, image processing is a vast research area by itself, and has many common areas with the early processing stages in a vision system.

            The field of computer vision has evolved rapidly over the last thirty years.  With increasing processor speeds, hard disk capacity, and large memory, implementing some of the systems in real time is becoming more and more realistic. One of the biggest bottlenecks in most vision problems is it's "unstructured" nature.  The simplest of vision problems can be exacerbated by artefacts like the high level of noise introduced by CCD cameras, the effects of shadows cast by objects, changes in the lighting condition, etc.  These are the most common problems.  To solve vision problems despite such problems, especially in a robust and generic fashion, is a difficult task.  It is quite common to underestimate the difficulty of the problem, mainly because human beings are endowed with a very sophisticated vision system, of which, only a tiny fraction of can currently be replicated by a computer system.  Thus, numerous technical questions remain unanswered, which is reflected in the limited success of vision systems in industry today.  We hardly find any vision system (other than those working in a structured environment such as in a machine vision setup) that has sold more than a few hundred units.

            The Department of Electrical and Computer Engineering has numerous researchers working closely on various aspects of computer vision, mainly to address some of the technical challenges highlighted above.  The hardware and the software systems are mostly housed in the Vision and Image Processing Laboratory.  The hardware consists of numerous PCs, SUN, and SGI workstations, CCD video cameras, digital video cameras, web cameras, frame grabbers, computer controllable pan/tilt/zoom cameras, optical table for precision set-up of equipment, microscope with computer-controlled stage and 3D high-resolution laser scanner.  The software includes video sequencing software, image analysis and processing tools (Adobe Photoshop, Ulead Tools, etc.), signal analysis and processing tools (like Matlab).

            The research work of the group can be categorised into two main classes: (i) 2D image analysis problems and (ii) 3D image analysis problems.  2D image analysis involves extracting the scene semantics from two dimensional still pictures.  For example, trying to identify a person from given a photograph would fall in this category.  3D image analysis problems would on the other hand, involve extracting the scene semantics from three dimensional (range) images, or images which come with spatial information.  For example, given a range image, splitting the image into different surfaces, and recognising the objects in this image falls in the category of 3D image analysis. Other challenges in the 3D vision research area include: inferring the depth of each point in the image and the motion parameters of the camera and the objects in the scene, from a sequence of 2D images (eg. a stereo pair of two images of the same scene). Figures 1 and 2 illustrate a couple of examples of tasks that can be accomplished by using 3D vision algorithms.

            In this Special Focus, we present a brief overview of four different research projects that are being carried out in the department.  The first deals with face recognition from photographs and video sequences, and takes a unique approach by using soft computing tools.  The second article deals with inferring dental problems, starting from the 3D range image of a dental cast.  The third article deals with the theoretical aspect of inferring the three dimensional structure from a sequence of images, especially following the cues from the human vision system.  The fourth article describes how image based rendering techniques can be used to design interactive browsers for viewing products.

 

(a)

 

     

(b)

Figure 1: 3D models of a human face (b) can be estimated from an image sequence (a).

(a)

(b)

Figure 2: Images (a) can be stitched using 3D vision theory to generate a panorama (b).

   

Contact Person: Dr K Sengupta
Tel: 874 6770, Fax: 779 1103
Email: eleks@nus.edu.sg