next up previous
Next: References

Research Statement

WenYi Zhao

The focus of my research has been several aspects of statistical image/signal processing and computer vision, and their applications, for example, better video coding scheme. Statistical pattern recognition techniques have been successfully applied to recognition tasks based on still image or video. However, in many cases, computer vision techniques are needed to fulfill difficult tasks such as 3D object recognition. One such example is face recognition in which the system is presented with a face picture from which the person's identification needs to be decided. The key challenge here is that we need to determine the class label from the 2D image of a 3D object. By incorporating computer vision and/or other techniques we could solve such difficult problems.

I. Statistical Image/Signal Processing
In statistical pattern recognition literature, one major approach is training based methods that are usually composed of the following steps: choosing an appropriate classifier, and then constructing the specific classifier, including estimation of parameters in the case of parametric classifiers. There are many classifiers available, including Bayesian classifier, nearest-neighbor rule, neural networks and linear discriminant, to name a few. The Bayesian classifier is optimal according to the traditional statistical pattern recognition theory. But for applications involving high-dimensional signals, the demand for a large number of training samples to construct a good Bayesian classifier is difficult to satisfy. Thus, researchers are continuing to search for classifiers that perform close to Bayesian but with fewer training samples.
A. Subspace Linear/Nonlinear Discriminant Analysis We proposed a statistical framework, subspace discriminant analysis, using which we can construct a practically good classifier (both linear and nonlinear) from a limited number of training samples. Discriminant analysis has been studied in both pattern recognition and statistics for many years. For example, linear classifier such as LDA (Linear Discriminant Analysis) has been successfully applied to the task of face recognition. On the other hand, subspace methods, especially PCA based ones, have been used for effective dimension reduction. Instead we proposed using a universal subspace for overcoming generalization/over-fitting problem in applications such as face recognition [4], Combining subspace and discriminant analysis, we proposed a general framework to solve practical classification problems. For example, a successful face recognition system has been built upon subspace LDA and it has been evaluated on a competitive face algorithms test called FERET test [4]. However, linear classifier has fundamental restrictions, they can not handle linearly non-separable cases which can occur even in the task of face recognition [8]. For such case, multiple subspaces or parametric subspace can be constructed from the original subspace to accommodate the inputs distorted by scaling, rotating, and translating etc [6]. More generally, the concept of subspace [14] can also be used to derive the so-called kernel PCA method [2] based on the replacement of dot product with an appropriate kernel function [1]. Using such method, the original signal can be transformed into the subspace of a much higher dimension space through a nonlinear mapping so that performing a linear classification on such subspace is close to performing non-linear classification in the original space.
B. Distance Metrics Distance metric has been an active research topic in pattern recognition community for many years. Various distance metrics are available, including Euclidean distance, Mahalanobis distance, Kullback-Leibler distance etc. For better classification, smart distance metrics have been studied. For example, discriminant analysis based distance metrics have been studied extensively. We proposed two new distance metrics: DCA based distance metric [9], and minimax based distance metric [7] and demonstrated the efficiency of these new distance metrics. Also in our subspace LDA face recognition system, we employed eigenvalue-guided weighted Euclidean distance for better performance [10].
C. Performance Evaluation Performance evaluation is the technique to evaluate how well the designed system performs. As more and more different approaches have been developed for similar or same task, performance evaluation becomes more and more important in order to choose the best approach in a given situation. Also evaluation task has been facilitated by creation of large databases and increasing computer power. However, such pure empirical approach depends very much on the size of database and the feature vector space. On the other hand, performance evaluation can be carried out based on the analysis of the empirical system performance by decomposing the empirical performance into theoretical performance of the given system and performance perturbation due to signal noise, small sample size etc. Such analysis not only gives the insight of how system performance is sensitive to the system implementation, but also helps to choose practically good classifiers in certain situation. For example, the performance of LDA classifier was analysized using matrix perturbation theory [5]. Also performance analysis based on Shape-from-Shading (SFS) suggest that significant illumination change can seriously degrade the system performance. Hence it is necessary to seek methods that compensate for these changes [8].
D. Improving the efficiency Different from LDA, we proposed a new scheme DCA (Discriminant Component Analysis) [9], which is analogous to PCA (Principal Component Analysis) but fundamentally different. The new scheme decomposes a signal into orthonormal bases such that for each base there is an eigenvalue representing the discriminatory power of projection in that direction. Because DCA iteratively seeks for the full orthonormal bases, it has the following advantages over LDA: first it encodes the full discriminatory information, second it generates eigenvalue which is more suitable as weights used in weighted distance metric, and third it is more suitable for non-Gaussian distribution.
E. Statistical Learning: Clustering and Factor Analysis Clustering is a standard unsupervised learning technique so we can have a more compact representation of the data. In [6], simple k-means algorithm has been successfully used in clustering more than 1,000 face images into 7 clusters. And with such clustering, the efficiency of solving a parameter estimation problem has been improved dramatically [6]. Factor Analysis is generally referred to the statistical method to recover the underlying factors used to model the observations (Independent Component Analysis is the newest addition to this branch). Traditionally the factor models are additive. More recently, a multiplicative model (bi-linear model) was proposed to address a wide range of problems [3]. Bi-linear model not only provides sufficiently expressive representation of factor interactions but also can be fit using efficient algorithms based on Singular Value Decomposition (SVD) and Expectation-Maximization (EM) algorithm. One application of this approach is to separate the face identification (one factor) and the pose (another factor); and then to perform pose estimation.

II. Combining Statistics with Computer Vision etc.
In many cases even with robust pattern recognition techniques, the task of complex object recognition can not be finished without incorporating other techniques. For example, the above mentioned robust subspace LDA face recognition system [6] is robust against any 2D face image transformation, but not against 3D transformation. This is because the 2D image does not explicitly encode any 3D information and hence pure pattern recognition techniques will not work in such cases. Meanwhile we notice that inferring 3D information from 2D has been a major research topic in computer vision community for many years. Hence we feel it is natural to combine pattern recognition and computer vision approaches to solve such difficult problems. However we also notice that one difficulty in applying computer vision techniques to a practical problem is that traditional computer vision is seeking perfect solutions to ill-posed problems under some unrealistic assumptions. Hence these techniques are fragile because the real tasks usually do not satisfy these assumptions. One possible solution to this difficulty would be to construct better computer vision techniques or integrating with other techniques. For example, we can combine computer graphics and computer vision to solve the ill-posed problem with the help of some prior knowledge about the objects.
A. Symmetric Shape-from-Shading We have proposed a new SFS algorithm which can handle symmetric objects such as a face [8]. Symmetry is very useful information that can be exploited in SFS algorithms for symmetric objects. However, implicitly bringing this information into existing SFS algorithms does not seem to help too much. So we describe a direct method for incorporating this important cue. Compared to existing SFS algorithms, the new symmetric SFS algorithm has the following advantages: a) It not only has a point-wise unique solution for the partial derivatives (p,q) but also a unique solution for albedo (Here the albedo can be either constant or piece-wise constant across the whole image plane.) b) By using the self-ratio image, problems due to variations in albedo are avoided. Hence a model-based light-source estimation approach becomes more accurate. c) Combining the symmetric SFS and regular SFSs, unique solution can be obtained even in case that shadow points are present.
B. Illumination-Insensitive Face Recognition using Symmetric SFS Even though symmetric SFS provides a better solution for symmetric objects than regular SFSs, using it for application such as face recognition is still difficult. This is partly due to possible violations of assumptions such as Lambertain model and single light source. In stead, we propose a direct image-to-image computation based on symmetric SFS and a generic 3D head model [8]. Such method has the following features in handling the illumination problem in face recognition: a) There is no training, hence only one image is needed. b) A new matching measure which is illumination-invariant is proposed. c) Since no full symmetric SFS is really carried out and the computation is image to image, it is fast. d) The problem of solving complex/arbitrary albedo information is avoided. To demonstrate the efficacy of our method, we have applied it to several publicly available face databases. We demonstrate significant performance improvement over existing face recognition systems using PCA and/or LDA, for images acquired under variable lighting conditions.
C. Model-Based Image Synthesis Image synthesis is an active research area which has numerous applications such as in computer graphics. Multi-view based image synthesis is a technique used to generate images under different views (or even lightings) based on multiples image of the same object/scene. There are two major approaches of this technique: 1) viewing point is static [11], 2) lighting is static [12]. Using just one image, we propose a model-based image synthesis method which can synthesize good-quality images with arbitrary albedo under different pose and illumination [8].
D. Physics-based 3D Object Recognition It has been a difficult problem to recognize a 3D object under different views if we only have one view of this object. Various learning based methods have been proposed. The success of these methods relies on large numbers of training samples. And for poses which do not have enough training samples, it is difficult to recognize new images under such poses since they are essentially extrapolation problems. A better alternative (at least in theory) is to infer 3D information from a single 2D image. After obtaining the 3D information, recognizing new images of the same object under different poses and illuminations is simple. However, as we know, it is not easy to infer accurate 3D information from just one single image. Currently we are using a generic 3D model to recover the frontal view image from a given image, which is a special process of synthesizing the frontal image from a given image [8]. After generating the frontal-view image, we can just apply the already-trained subspace LDA system.

III. Applications
Both statistical signal processing and computer vision have numerous applications: video compression, speech recognition, content-based information retrieval, etc. One particularly important application is about human beings: how people act and respond, access control based on person identity, etc. For example, face recognition can be utilized in many applications. As such effort, we have built a prototype viewer-identification system for television based on [10]. We have also made a proposal of face descriptor to MPEG-7 based on our successful face recognition system [13]. The initial testing result on the MPEG-7 testing content is very satisfactory. On the subject of multimedia applications, we have demonstrated that the performance of a simple color-based shot-detection algorithms can be improved using the minimax distance metric [7].

IV. Future Directions
My future research would focus on the following directions:




next up previous
Next: References

Wenyi Zhao
Wed Aug 18 16:25:37 EDT 1999