Rama Chellappa
Rama Chellappa's research interests span a number of areas including
context-based image exploitation, electronic image stabilization, face
recognition, video compression and automatic target recognition. A brief
overview of research accomplishments in these areas is given below.
Detecting Objects in Impulsive Environments
Detecting targets occluded by foliage using foliage-penetrating (FOPEN)
ultra-wideband synthetic aperture radar (UWB SAR) images is an important
and challenging problem. Given the different nature of target returns in
foliage and non-foliage regions and the very low signal-to-clutter ratio
in UWB imagery, conventional detection algorithms fail to yield robust
target detection results.
A new target detection algorithm has been developed that (a) incorporates
Symmetric Alpha-Stable (SaS) distributions for accurate clutter modeling,
(b) constructs a 2D site model for deriving local context, and (c) exploits
the site model for region-adaptive target detection.
The main contribution of this work lies in the design of an adaptive
detection scheme which relies on modeling of the clutter statistics by
SaS distributions for image segmentation and CFAR detection. We use the
SaS pdfs to accurately describe the highly impulsive, bursty behavior of
FOPEN clutter. Theoretical and empirical evidence is provided to justify
the use of the SaS model. Although the lack of explicit forms for their
density functions inhibits their use, simple techniques using SaS distributions
can be developed for image segmentation and target detection. In particular,
we have derived a closed-form expression for the SaS CFAR test that adjusts
the detection threshold according to the local clutter type. Furthermore,
a mixture SaS CFAR test has been introduced to reduce the number of false
alarms along clutter boundaries. This approach has been applied to detection
of tumors in mammograms (Figure 1).
Figure 1. Detection of tumors in mammograms.
Statistical Detection of Two-Dimensional Shapes
A new method of detecting two-dimensional shapes in
cluttered images has been developed and its performance bounds have
been calculated. Modeling a shape as a flat plateau on a uniform background,
we have defined an operator for detecting the shape by extending Canny's
optimal step edge detector to detect a step edge around the shape's boundary
contour. We have extended the notion of step edge detection to two-dimensional
ìglobalî edge detection, and have derived some of its statistical
properties and used them to predict its detection and localization performances.
Experiments on vehicle detection in aerial images and on human facial feature
detection have been conducted to verify the effectiveness and statistical
properties of this approach (Figure 2).
Figure 2. (a) Vehicle detection. (b) Facial feature detection.
Adaptive LADAR ATR
For applications using mobile LADAR sensors, there are physical limitations
on the size of the image array. To optimally utilize limited sensor capability,
an adaptive LADAR ATR algorithm has been developed. The approach utilizes
the reconfigurable nature of the LADAR sensor to dynamically select the
best angular resolution and scanning pattern for LADAR ATR. Issues such
as angular resolution selection, aiming area selection, and target recognition
using the adaptive imaging scheme are addressed. A physics-based LADAR
simulator is used to generate simulated LADAR chips for targets at various
azimuth angles. An m-of-n pixel matching scheme is used for internal shape
matching and boundary verification. Surface variance and boundary variance
based similarity measures are used to select the most informative sub-region
for a second look. Preliminary experimental results on simulated LADAR
images have shown performance improvement using the multi-look LADAR ATR
scheme. An interactive graphical user interface using Microsoft Visual
C++ is shown in Figure 3.
Figure 3. Graphical user interface for multi-look LADAR ATR.
ATR Methods for SAR Imagery
We have developed matching schemes that use selected features in SAR target
imagery to classify candidate images as specific target types. This work
expands upon our previous work on feature extraction using the Topographical
Primal Sketch and uses those features (as well as others in the literature)
to classify targets in SAR imagery. We formulate the matching problem as
a non-linear objective function to maximize the number of matched features
and minimize the distance between features. The minimum of this function
is found using a deterministic annealing process. The objective function
may include an affine transformation which is solved for iteratively using
an analytically computed minimum at each temperature of the annealing process.
Thus the images do not need to be perfectly registered, as any error between
them is solved for in the optimization process. We have also extended the
initial objective function to incorporate multiple feature classes which
must each be matched within its own feature class. Non-uniform annealing
schedules are also employed to speed up the convergence process. This matching
method is robust to spurious, missing and migrating features. We have implemented
this algorithm on simulated and real SAR images.
Figure 4 shows two SAR images of the same vehicle at slightly different
aspects. The feature maps shown are similar, but when overlayed, spurious,
missing and shifted feature locations are readily apparent. The associated
match matrix contains a 1 in position (i,j) when feature i is found to
match feature j. The nonlinear optimization of matching feature points
indicates a clear match between the two images.
Originals
Feature Maps
Match Matrix/Overlay
Figure 4. SAR feature matching.
Probabilistic Optical Flow Formulation
A fundamental problem in computer vision is estimating the relative motion
between the camera and the scene given a set of images. Knowledge of the
relative motion enables us to solve for the geometry of camera motion and
distinguish independently moving objects, and has diverse applications
in scene reconstruction and interpretation, scene change detection, navigation,
etc. Most of the recent advances in motion estimation have been in understanding
the inherent geometric framework that governs the imaging process, which
in turn has led to algebraic solutions being proposed. However, a significant
drawback to such an approach is the fact that the observed data (drawn
from images) is inherently noisy, due to pixel and intensity quantization,
sensor noise, etc. This noise has a significant impact on our ability to
solve for the motion and should be accounted for in the estimation process.
Our work has concentrated on accounting for the noise by the use of a probabilistic
framework for motion estimation.
Our recent work has concentrated on the use of the image intensity information
to solve motion problems related to differential (small) motion. The well-known
image brightness constraint gives a single constraint for a two-dimensional
flow field. Thus almost all work on flow estimation has concentrated on
overcoming this missing information by using meaningful priors on the flow
field (typically smoothness). However, such methods fail to address the
impact of noise on the estimation of image derivatives which in turn determine
the image brightness constraint. Our probabilistic framework naturally
incorporates such uncertainty and generates a meaningful description of
the underlying random processes that govern our observations. This probabilistic
model can be applied to many classical problems such as flow estimation,
estimation of the Focus of Expansion, motion estimation, and motion field
segmentation.
Structure from Optical Flow using a Fast Error Search Technique
3D reconstruction from a sequence of images is known as the ì3D
structure from motion problemî. While in theory it can be shown to
have a solution, it remains very difficult to solve in practice. This is
due to the bilinear formulation of the equations relating motion and 3D
structure. We have addressed this problem using a computational approach
that we term ìfast partial searchî. Here, we hypothesize the
location of the epipole over an area and measure the goodness of fit of
the data to this hypothesis. We compute the least squares error of the
system without actually solving the equations. The theoretical analysis
we develop enables the applicability of Fourier techniques to making such
a determination, resulting in drastically reduced computational effort.
Our preliminary results on synthetic data and in applications to 3D stabilization,
moving object detection, range finding and obstacle detection have been
very encouraging. We foresee development of a mechanism for generating
3D virtual reality models from image sequences using this technology.
Results on obstacle detection from a monocular image sequence are shown
in Figure 5.
(a)
(b)
(c)
(d)
Figure 5. Obstacle detection: (ab) Two consecutive frames of the
sequence. (c) Inverse depth map. (d) Obstacles.
Egomotion Estimation Using Video and Inertial Data
A basic problem in computer vision is accurate estimation of the motion
trajectory of a moving platform carrying a video camera; this is very important
in many applications involving robotic and unmanned vehicles. However,
conventional algorithms using only video information encounter difficulties
either in robustly estimating irregular platform motion or when the image
quality is not very good, so that accurate feature point correspondence
is hard to build. Inspired by inertial navigation techniques and based
on the recent development of Micro-Electro-Mechanical-Systems (MEMS), inertial
data obtained by a set of on-board MEMS-based microsensors are applied
to camera egomotion estimation to handle irregular motion and combat feature
tracking noise. In our approach, the measurements from the video camera
(a set of feature points in image frames tracked by some feature tracking
algorithm) and microsensors (inertial data) are fed into an Iterated Extended
Kalman Filter (IEKF) to recursively estimate platform egomotion. The algorithm
framework is shown in Figure 6. Tests using both synthetic and real image
sequences show that our approach using both video and inertial data has
much better performance than methods using only video information. We have
computed the egomotion estimate bias obtained both with and without inertial
data when platform motion is irregular or feature tracking noise is large.
The bias obtained by our approach using inertial data is much less than
the bias in methods that do not use inertial data. Computer vision techniques
augmented by inertial data are also expected to have much better
performance in 3-D video segmentation, structure from motion computation,
moving object detection, and many other basic problems in computer vision.
Figure 6. Egomotion estimation using video and inertial data.
Inverse Inhomogeneous Diffusion in Image Processing
A new model has been proposed for inverse inhomogeneous diffusion based
on a general inverse parabolic partial differential equation. It has been
shown that the general model can be reduced to a pseudo-homogeneous form
by change of variables. The solution has been found using the semigroup
theory of bounded linear operators, leading to a non-linear integral operator
which can be implemented by discretization. The homogeneous case, namely
the backward heat equation, has been shown to be a special case leading
to a linear filter, and can be implemented in the Fourier domain. The proposed
model provides more flexibility compared to other existing approaches such
as total variation based methods and shock filters, since the inhomogeneity
of the diffusion process is modeled by one parameter with pointwise dependence
on the spatial and time axes. The method has also been incorporated in
a multi-frame super-resolution algorithm and applied to video and infrared
data. Figure 7 shows a frame from a sequence to which the filter was applied.
The diffusion parameter in this application was chosen inversely proportional
to the Laplacian of the image in order to obtain edge-based enhancement.


Figure 7. Inverse diffusion of visual data. Left: original image, middle:
blurred image, right: inverse diffused image.
Object Detection and Tracking in Images
We are developing algorithms for the detection and tracking of stationary
as well as moving objects in an image sequence captured from a moving platform.
Detection and tracking of objects in images represents an important step
towards achieving automated activity monitoring capabilities. The challenge
lies in being able to reliably and quickly detect and track multiple objects
of interest against a cluttered background. The work basically involves
the development of the following algorithms: a stabilization technique
to compensate for camera motion, a detection module to discriminate target
objects from background, a tracking module to compute the correspondence
between the detected foreground regions in the current frame and the detected
objects in the previous frame, and a recognition module.
We have developed a new method based on the principle of higher-order
statistical learning for the detection of target objects in images. The
proposed method approximately models the unknown distribution of the images
of the target objects by learning higher-order statistical (HOS) information
about the ìobject classî from sample images. Training data
samples are first clustered and the statistical parameters corresponding
to each cluster are estimated. Clustering is based on an HOS-based decision
measure which is obtained by deriving a series expansion for the multivariate
probability density function in terms of Gaussians and Hermite polynomials.
The HOS-based measure has very good discriminating capability.
Given a test image, statistical information about the background is
learnt ìon the flyî using a dynamic background learning algorithm.
Detection is then performed by searching the test image for object patterns
at all points in the image and across different scales. A vector of difference
measurements of the test pattern with respect to each of the clusters is
computed using the HOS-based closeness measure. A simple thresholding scheme
then determines whether the test pattern belongs to the ìobject
classî. Considering the complexity of the problem, we observe that
the detection rate of the method is quite good and false alarms are very
few. Examples of application to detection of vehicles and people in static
imagery and of people in video are shown in Figures 8 and 9.


Figure 8. Vehicle and people detection in static imagery.
Figure 9. Detection and tracking of people in video (preliminary results).
Object Verification using Video
An approach to model-based dynamic object verification using video has
been developed. From image sequences containing the moving object, we compute
its motion trajectory. Then we estimate its 3-D pose at each time step.
Pose estimation is formulated as a search problem, with the search space
strictly constrained by the motion trajectory information of the moving
object and assumptions about the scene structure. A generalized Hausdorff
metric, which is more robust to noise and allows a confidence interpretation,
is employed in the matching procedure used for pose estimation as well
as verification. The pose evolution curves are used to assist in the acceptance
or rejection of an object hypothesis at a later stage. The models are acquired
from real image sequences of the objects. Edge maps are extracted and used
for matching. Experiments have been done with both infrared and optical
sequences containing moving objects involved in complex motions.
Q-Warping: Direct Computation of the Quadratic Reference Surface
We are working on new ways to model 3D from images. Recent work has focused
on the use of direct methods for finding correspondence and parametric
structure. The importance of this approach is in its robustness and wide
applicability. An example of the use of our approach to generate new images
from given images is shown in Figure 10.
Original Images
New Image
Figure 10. Q-warping.
Image Compression: Multiple Description Coding and Iterative Decoding
Multiple Description Coding (MDC) is a source coding technique for (on-off)
diversity channels that has recently witnessed a rapid increase in research
interest. MDC is used to generate multiple (correlated) descriptions of
a source. These descriptions are transmitted over independent channels
to the receiver. When all descriptions are received, a high-quality reconstruction
is possible. In the event of failure of one or more of the channels, it
is desired that the reconstruction still be of acceptable quality.
We have developed an efficient MDC image coding scheme (Figure 11) in
the context of classification-based subband coding. The image coder uses
multiple description scalar quantizers and a Lagrangian technique to allocate
rates and redundancies among the classified subbands in an optimal manner.
We are currently adapting this coder to encode video sequences using three-dimensional
subband decompositions. This technique has potential for video transmission
over lossy packet networks.

Figure 11. Multiple-description image coding.
We have also examined transmission of multiple descriptions over noisy
(AWGN) channels, rather than the on-off channels that are traditionally
considered. We have introduced the use of iterative decoding techniques,
similar to those used in ìturboî decoding, to decode correlated
binary descriptions transmitted over a noisy channel.
For a given transmission rate per channel and a given channel state,
the efficacy of iterative decoding depends on the correlatedness of the
two descriptions produced by the multiple description encoder. We have
demonstrated that there is an ideal amount of redundancy or correlation
for a given channel state. Hence multiple description codes can also be
viewed as joint source-channel codes.
Motivated by recent work on analysis of error-correcting codes, we have
modeled the transmission of multiple descriptions by graphical models like
Bayesian networks and factor graphs. Decoding then amounts to a probability
inference problem on the graph, which can be accomplished in an approximate
manner by probability propagation.
Robust Face Recognition and Multimedia Applications
Our research on face recognition has been focused on improving the robustness
of a subspace Linear Discriminant Analysis (LDA) system and exploring new
applications of this technology. We have developed a subspace face recognition
system based on LDA; an example of its performance is shown in Figure 12.
In this system we employ a unique criterion to choose the subspace dimension
and a weighted distance metric guided by the LDA eigenvalues. The system
not only gives excellent classification of the training images but also
enjoys good predictive/generalized classification performance. However,
it has some drawbacks, such as being sensitive to small image scale distortions
and/or illumination changes. In order to improve the robustness of our
system to such small image distortions, we use multiple subspaces or a
parametric subspace directly computed from the original subspace. We also
address the illumination problem using a heuristic method which works quite
well in practice. Finally, a proposal for describing face images in multimedia
content using our approach has been made to the MPEG-7 committee. The initial
test results for our method are very good.


Figure 12. Face recognition using subspace LDA.
![[Up]](Up.gif)
![[Top]](Top.gif)
Please mail questions/comments to
webmaster@cfar.umd.edu
November 1999