Rama Chellappa


Rama Chellappa's research interests span a number of areas including context-based image exploitation, electronic image stabilization, face recognition, video compression and automatic target recognition. A brief overview of research accomplishments in these areas is given below.
 

Detecting Objects in Impulsive Environments

Detecting targets occluded by foliage using foliage-penetrating (FOPEN) ultra-wideband synthetic aperture radar (UWB SAR) images is an important and challenging problem. Given the different nature of target returns in foliage and non-foliage regions and the very low signal-to-clutter ratio in UWB imagery, conventional detection algorithms fail to yield robust target detection results.

A new target detection algorithm has been developed that (a) incorporates Symmetric Alpha-Stable (SaS) distributions for accurate clutter modeling, (b) constructs a 2D site model for deriving local context, and (c) exploits the site model for region-adaptive target detection.

The main contribution of this work lies in the design of an adaptive detection scheme which relies on modeling of the clutter statistics by SaS distributions for image segmentation and CFAR detection. We use the SaS pdfs to accurately describe the highly impulsive, bursty behavior of FOPEN clutter. Theoretical and empirical evidence is provided to justify the use of the SaS model. Although the lack of explicit forms for their density functions inhibits their use, simple techniques using SaS distributions can be developed for image segmentation and target detection. In particular, we have derived a closed-form expression for the SaS CFAR test that adjusts the detection threshold according to the local clutter type. Furthermore, a mixture SaS CFAR test has been introduced to reduce the number of false alarms along clutter boundaries. This approach has been applied to detection of tumors in mammograms (Figure 1).

Figure 1. Detection of tumors in mammograms.
 

Statistical Detection of Two-Dimensional Shapes

A new method of detecting two-dimensional shapes in
cluttered images has been developed and its performance bounds have been calculated. Modeling a shape as a flat plateau on a uniform background, we have defined an operator for detecting the shape by extending Canny's optimal step edge detector to detect a step edge around the shape's boundary contour. We have extended the notion of step edge detection to two-dimensional ìglobalî edge detection, and have derived some of its statistical properties and used them to predict its detection and localization performances. Experiments on vehicle detection in aerial images and on human facial feature detection have been conducted to verify the effectiveness and statistical properties of this approach (Figure 2).

Figure 2. (a) Vehicle detection. (b) Facial feature detection.

Adaptive LADAR ATR

For applications using mobile LADAR sensors, there are physical limitations on the size of the image array. To optimally utilize limited sensor capability, an adaptive LADAR ATR algorithm has been developed. The approach utilizes the reconfigurable nature of the LADAR sensor to dynamically select the best angular resolution and scanning pattern for LADAR ATR. Issues such as angular resolution selection, aiming area selection, and target recognition using the adaptive imaging scheme are addressed. A physics-based LADAR simulator is used to generate simulated LADAR chips for targets at various azimuth angles. An m-of-n pixel matching scheme is used for internal shape matching and boundary verification. Surface variance and boundary variance based similarity measures are used to select the most informative sub-region for a second look. Preliminary experimental results on simulated LADAR images have shown performance improvement using the multi-look LADAR ATR scheme. An interactive graphical user interface using Microsoft Visual C++ is shown in Figure 3.

Figure 3. Graphical user interface for multi-look LADAR ATR.
 

ATR Methods for SAR Imagery

We have developed matching schemes that use selected features in SAR target imagery to classify candidate images as specific target types. This work expands upon our previous work on feature extraction using the Topographical Primal Sketch and uses those features (as well as others in the literature) to classify targets in SAR imagery. We formulate the matching problem as a non-linear objective function to maximize the number of matched features and minimize the distance between features. The minimum of this function is found using a deterministic annealing process. The objective function may include an affine transformation which is solved for iteratively using an analytically computed minimum at each temperature of the annealing process. Thus the images do not need to be perfectly registered, as any error between them is solved for in the optimization process. We have also extended the initial objective function to incorporate multiple feature classes which must each be matched within its own feature class. Non-uniform annealing schedules are also employed to speed up the convergence process. This matching method is robust to spurious, missing and migrating features. We have implemented this algorithm on simulated and real SAR images.

Figure 4 shows two SAR images of the same vehicle at slightly different aspects. The feature maps shown are similar, but when overlayed, spurious, missing and shifted feature locations are readily apparent. The associated match matrix contains a 1 in position (i,j) when feature i is found to match feature j. The nonlinear optimization of matching feature points indicates a clear match between the two images.


                    Originals                            Feature Maps                Match Matrix/Overlay

Figure 4. SAR feature matching.
 

Probabilistic Optical Flow Formulation

A fundamental problem in computer vision is estimating the relative motion between the camera and the scene given a set of images. Knowledge of the relative motion enables us to solve for the geometry of camera motion and distinguish independently moving objects, and has diverse applications in scene reconstruction and interpretation, scene change detection, navigation, etc. Most of the recent advances in motion estimation have been in understanding the inherent geometric framework that governs the imaging process, which in turn has led to algebraic solutions being proposed. However, a significant drawback to such an approach is the fact that the observed data (drawn from images) is inherently noisy, due to pixel and intensity quantization, sensor noise, etc. This noise has a significant impact on our ability to solve for the motion and should be accounted for in the estimation process. Our work has concentrated on accounting for the noise by the use of a probabilistic framework for motion estimation.

Our recent work has concentrated on the use of the image intensity information to solve motion problems related to differential (small) motion. The well-known image brightness constraint gives a single constraint for a two-dimensional flow field. Thus almost all work on flow estimation has concentrated on overcoming this missing information by using meaningful priors on the flow field (typically smoothness). However, such methods fail to address the impact of noise on the estimation of image derivatives which in turn determine the image brightness constraint. Our probabilistic framework naturally incorporates such uncertainty and generates a meaningful description of the underlying random processes that govern our observations. This probabilistic model can be applied to many classical problems such as flow estimation, estimation of the Focus of Expansion, motion estimation, and motion field segmentation.
 

Structure from Optical Flow using a Fast Error Search Technique

3D reconstruction from a sequence of images is known as the ì3D structure from motion problemî. While in theory it can be shown to have a solution, it remains very difficult to solve in practice. This is due to the bilinear formulation of the equations relating motion and 3D structure. We have addressed this problem using a computational approach that we term ìfast partial searchî. Here, we hypothesize the location of the epipole over an area and measure the goodness of fit of the data to this hypothesis. We compute the least squares error of the system without actually solving the equations. The theoretical analysis we develop enables the applicability of Fourier techniques to making such a determination, resulting in drastically reduced computational effort. Our preliminary results on synthetic data and in applications to 3D stabilization, moving object detection, range finding and obstacle detection have been very encouraging. We foresee development of a mechanism for generating 3D virtual reality models from image sequences using this technology.

Results on obstacle detection from a monocular image sequence are shown in Figure 5.

(a) (b)
(c) (d)

Figure 5. Obstacle detection: (a­b) Two consecutive frames of the sequence. (c) Inverse depth map. (d) Obstacles.
 

Egomotion Estimation Using Video and Inertial Data

A basic problem in computer vision is accurate estimation of the motion trajectory of a moving platform carrying a video camera; this is very important in many applications involving robotic and unmanned vehicles. However, conventional algorithms using only video information encounter difficulties either in robustly estimating irregular platform motion or when the image quality is not very good, so that accurate feature point correspondence is hard to build. Inspired by inertial navigation techniques and based on the recent development of Micro-Electro-Mechanical-Systems (MEMS), inertial data obtained by a set of on-board MEMS-based microsensors are applied to camera egomotion estimation to handle irregular motion and combat feature tracking noise. In our approach, the measurements from the video camera (a set of feature points in image frames tracked by some feature tracking algorithm) and microsensors (inertial data) are fed into an Iterated Extended Kalman Filter (IEKF) to recursively estimate platform egomotion. The algorithm framework is shown in Figure 6. Tests using both synthetic and real image sequences show that our approach using both video and inertial data has much better performance than methods using only video information. We have computed the egomotion estimate bias obtained both with and without inertial data when platform motion is irregular or feature tracking noise is large. The bias obtained by our approach using inertial data is much less than the bias in methods that do not use inertial data. Computer vision techniques augmented by inertial data  are also expected to have much better performance in 3-D video segmentation, structure from motion computation, moving object detection, and many other basic problems in computer vision.

Figure 6. Egomotion estimation using video and inertial data.
 

Inverse Inhomogeneous Diffusion in Image Processing

A new model has been proposed for inverse inhomogeneous diffusion based on a general inverse parabolic partial differential equation. It has been shown that the general model can be reduced to a pseudo-homogeneous form by change of variables. The solution has been found using the semigroup theory of bounded linear operators, leading to a non-linear integral operator which can be implemented by discretization. The homogeneous case, namely the backward heat equation, has been shown to be a special case leading to a linear filter, and can be implemented in the Fourier domain. The proposed model provides more flexibility compared to other existing approaches such as total variation based methods and shock filters, since the inhomogeneity of the diffusion process is modeled by one parameter with pointwise dependence on the spatial and time axes. The method has also been incorporated in a multi-frame super-resolution algorithm and applied to video and infrared data. Figure 7 shows a frame from a sequence to which the filter was applied. The diffusion parameter in this application was chosen inversely proportional to the Laplacian of the image in order to obtain edge-based enhancement.

Figure 7. Inverse diffusion of visual data. Left: original image, middle: blurred image, right: inverse diffused image.
 

Object Detection and Tracking in Images

We are developing algorithms for the detection and tracking of stationary as well as moving objects in an image sequence captured from a moving platform. Detection and tracking of objects in images represents an important step towards achieving automated activity monitoring capabilities. The challenge lies in being able to reliably and quickly detect and track multiple objects of interest against a cluttered background. The work basically involves the development of the following algorithms: a stabilization technique to compensate for camera motion, a detection module to discriminate target objects from background, a tracking module to compute the correspondence between the detected foreground regions in the current frame and the detected objects in the previous frame, and a recognition module.

We have developed a new method based on the principle of higher-order statistical learning for the detection of target objects in images. The proposed method approximately models the unknown distribution of the images of the target objects by learning higher-order statistical (HOS) information about the ìobject classî from sample images. Training data samples are first clustered and the statistical parameters corresponding to each cluster are estimated. Clustering is based on an HOS-based decision measure which is obtained by deriving a series expansion for the multivariate probability density function in terms of Gaussians and Hermite polynomials. The HOS-based measure has very good discriminating capability.

Given a test image, statistical information about the background is learnt ìon the flyî using a dynamic background learning algorithm. Detection is then performed by searching the test image for object patterns at all points in the image and across different scales. A vector of difference measurements of the test pattern with respect to each of the clusters is computed using the HOS-based closeness measure. A simple thresholding scheme then determines whether the test pattern belongs to the ìobject classî. Considering the complexity of the problem, we observe that the detection rate of the method is quite good and false alarms are very few. Examples of application to detection of vehicles and people in static imagery and of people in video are shown in Figures 8 and 9.

Figure 8. Vehicle and people detection in static imagery.

Figure 9. Detection and tracking of people in video (preliminary results).
 

Object Verification using Video

An approach to model-based dynamic object verification using video has been developed. From image sequences containing the moving object, we compute its motion trajectory. Then we estimate its 3-D pose at each time step. Pose estimation is formulated as a search problem, with the search space strictly constrained by the motion trajectory information of the moving object and assumptions about the scene structure. A generalized Hausdorff metric, which is more robust to noise and allows a confidence interpretation, is employed in the matching procedure used for pose estimation as well as verification. The pose evolution curves are used to assist in the acceptance or rejection of an object hypothesis at a later stage. The models are acquired from real image sequences of the objects. Edge maps are extracted and used for matching. Experiments have been done with both infrared and optical sequences containing moving objects involved in complex motions.

Q-Warping: Direct Computation of the Quadratic Reference Surface

We are working on new ways to model 3D from images. Recent work has focused on the use of direct methods for finding correspondence and parametric structure. The importance of this approach is in its robustness and wide applicability. An example of the use of our approach to generate new images from given images is shown in Figure 10.

Original Images                                                           New Image

Figure 10. Q-warping.
 

Image Compression: Multiple Description Coding and Iterative Decoding

Multiple Description Coding (MDC) is a source coding technique for (on-off) diversity channels that has recently witnessed a rapid increase in research interest. MDC is used to generate multiple (correlated) descriptions of a source. These descriptions are transmitted over independent channels to the receiver. When all descriptions are received, a high-quality reconstruction is possible. In the event of failure of one or more of the channels, it is desired that the reconstruction still be of acceptable quality.

We have developed an efficient MDC image coding scheme (Figure 11) in the context of classification-based subband coding. The image coder uses multiple description scalar quantizers and a Lagrangian technique to allocate rates and redundancies among the classified subbands in an optimal manner. We are currently adapting this coder to encode video sequences using three-dimensional subband decompositions. This technique has potential for video transmission over lossy packet networks.

Figure 11. Multiple-description image coding.

We have also examined transmission of multiple descriptions over noisy (AWGN) channels, rather than the on-off channels that are traditionally considered. We have introduced the use of iterative decoding techniques, similar to those used in ìturboî decoding, to decode correlated binary descriptions transmitted over a noisy channel.

For a given transmission rate per channel and a given channel state, the efficacy of iterative decoding depends on the correlatedness of the two descriptions produced by the multiple description encoder. We have demonstrated that there is an ideal amount of redundancy or correlation for a given channel state. Hence multiple description codes can also be viewed as joint source-channel codes.

Motivated by recent work on analysis of error-correcting codes, we have modeled the transmission of multiple descriptions by graphical models like Bayesian networks and factor graphs. Decoding then amounts to a probability inference problem on the graph, which can be accomplished in an approximate manner by probability propagation.
 

Robust Face Recognition and Multimedia Applications

Our research on face recognition has been focused on improving the robustness of a subspace Linear Discriminant Analysis (LDA) system and exploring new applications of this technology. We have developed a subspace face recognition system based on LDA; an example of its performance is shown in Figure 12. In this system we employ a unique criterion to choose the subspace dimension and a weighted distance metric guided by the LDA eigenvalues. The system not only gives excellent classification of the training images but also enjoys good predictive/generalized classification performance. However, it has some drawbacks, such as being sensitive to small image scale distortions and/or illumination changes. In order to improve the robustness of our system to such small image distortions, we use multiple subspaces or a parametric subspace directly computed from the original subspace. We also address the illumination problem using a heuristic method which works quite well in practice. Finally, a proposal for describing face images in multimedia content using our approach has been made to the MPEG-7 committee. The initial test results for our method are very good.


Figure 12. Face recognition using subspace LDA.


[Up][Top][Search]

Please mail questions/comments to
webmaster@cfar.umd.edu

November 1999