Vision and Image Processing Research
Computer vision, as the name suggests, is the technology by which computers can interpret images. Images, which can be part of a video sequence or scanned from still photographs, are essentially arrays of numbers. For example, in a greyscale image, every location in the image is represented by an intensity value. So, for a greyscale image of size 640x480, all the computer "sees" is a data set of size 640x480 bytes. The task of a vision researcher is to design algorithms, or procedures so that certain scene semantics can be extracted from these numbers. Image processing, on the other hand, implies image to mage transformation. Examples of this include enhancement of the image quality, compression, restoration, etc. Although very much related to computer vision, image processing is a vast research area by itself, and has many common areas with the early processing stages in a vision system.
The
field of computer vision has evolved rapidly over the last thirty years.
With increasing processor speeds, hard disk capacity, and large memory,
implementing some of the systems in real time is becoming more and more
realistic. One of the biggest
bottlenecks in most vision problems is it's "unstructured" nature.
The simplest of vision problems can be exacerbated by artefacts like the
high level of noise introduced by CCD cameras, the effects of shadows cast by
objects, changes in the lighting condition, etc.
These are the most common problems.
To solve vision problems despite such problems, especially in a robust
and generic fashion, is a difficult task. It
is quite common to underestimate the difficulty of the problem, mainly because
human beings are endowed with a very sophisticated vision system, of which, only
a tiny fraction of can currently be replicated by a computer system.
Thus, numerous technical questions remain unanswered, which is reflected
in the limited success of vision systems in industry today.
We hardly find any vision system (other than those working in a
structured environment such as in a machine vision setup) that has sold more
than a few hundred units.
The
Department of Electrical and Computer Engineering has numerous researchers working
closely on various aspects of computer vision, mainly to address some of the
technical challenges highlighted above. The
hardware and the software systems are mostly housed in the Vision and Image
Processing Laboratory. The
hardware consists of numerous PCs, SUN, and SGI workstations, CCD video cameras,
digital video cameras, web cameras, frame grabbers, computer controllable
pan/tilt/zoom cameras, optical table for precision set-up of equipment,
microscope with computer-controlled stage and 3D high-resolution laser scanner.
The software includes video sequencing software, image analysis and processing
tools (Adobe Photoshop, Ulead Tools, etc.), signal analysis and processing tools
(like Matlab).
The
research work of the group can be categorised into two main classes: (i) 2D
image analysis problems and (ii) 3D image analysis problems.
2D image analysis involves extracting the scene semantics from two
dimensional still pictures. For
example, trying to identify a person from given a photograph would fall in this
category. 3D image analysis
problems would on the other hand, involve extracting the scene semantics from
three dimensional (range) images, or images which come with spatial information.
For example, given a range image, splitting the image into different
surfaces, and recognising the objects in this image falls in the category of 3D
image analysis. Other challenges in the 3D vision research area include:
inferring the depth of each point in the image and the motion parameters of the
camera and the objects in the scene, from a sequence of 2D images (eg. a stereo
pair of two images of the same scene). Figures 1 and 2 illustrate a couple of
examples of tasks that can be accomplished by using 3D vision algorithms.
In
this Special Focus, we present a brief overview of four different research
projects that are being carried out in the department.
The first deals with face recognition from photographs and video
sequences, and takes a unique approach by using soft computing tools.
The second article deals with inferring dental problems, starting from
the 3D range image of a dental cast. The
third article deals with the theoretical aspect of inferring the three
dimensional structure from a sequence of images, especially following the cues
from the human vision system. The
fourth article describes how image based rendering techniques can be used to
design interactive browsers for viewing products.

(a)

(b)
Figure
1: 3D models of a human face (b) can be estimated from an image sequence (a).
![]() |
![]() |
![]() |
(a)

(b)
Figure 2: Images (a) can be stitched using 3D vision
theory to generate a panorama (b).