After our work with M. Spetsakis on the geometry of multiple views from point and line correspondences and the introduction of the trilinear constraints, we realized that an implementation of multiple-view geometry on actual and robotic systems cannot happen through feature correspondences. At that time, our research was influenced by the philosophy of direct perception advocated by Gibson. Since perception happens immediately, properties of the scene in view should be directly encoded in patterns or aggregate structures of various image measurements. Through the application of computational principles we have been searching for these structures and their associated representations. In this research program, the technical problems we have investigated are related to basic processes underlying the perception of 3D motion, shape and their relationship.
With C. Fermüller we have performed a series of geometric studies of motion fields obtained from image sequences which revealed that 3D motion is encoded in the form of global patterns in the image. Measuring robust qualitative properties of image motion along sets of appropriately grouped directions, the structure from motion problem is transformed into a pattern recognition problem. In addition, the class of patterns found has been used in the implementation of algorithms supporting fixation, visuomotor control in robotic systems, and the analysis of several illusions in human vision. Furthermore, projections of motion fields have several interesting geometric features that give rise to algorithms with robustness properties, because they allow the direct development of visual motion competences using the value of the image flow along the gradient direction. This work shows how epipolar geometry can be directly estimated for the case of a moving camera using the spatiotemporal derivatives of the image function.
Regarding the estimation of visual shape, psychophysical experiments and computational studies show that actual systems cannot estimate exact metric distances and shapes, but instead derive a distorted version of space. One of the reasons for this is geometric. In the case of a moving system or a stereo system the 3D viewing geometry (rigid transformation between views) is estimated with some error and consequently a distorted version of space is computed. We studied the transformation between perceptual space (i.e., computed space) and actual physical space which, in general, amounts to a Cremona transformation when minimal information about image motion is assumed. This transformation revealed a large number of interesting properties about the relationship of 3D motion and shape. It explained a large amount of data from psychophysical experiments and illusions on the perception of depth. In addition, since visible surfaces have positive depth, by analyzing the geometry of the regions where space is distorted by a negative factor and analyzing the conditions making these regions minimal, we developed with C. Fermüller an algorithm-independent stability analysis for structure from motion. This study also compares the performance of spherical and planar eyes with regard to 3D motion and shape estimation.
Since metric shape and environmental layout cannot be computed in practice, vision systems need to compute a number of alternative shape representations for many visual tasks requiring different amounts of memory and computation, ranging from local ones such as obstacle detection to more intricate ones such as homing and recognition. Our research activities for the immediate future are concentrated on the understanding of these "purposive" representations. The methodology followed is geometric. In particular, since space distorts, the representations of interest should amount to invariants of the distortion function. Extraction of these invariants, although difficult in the general case, becomes feasible in the context of a set of tasks such as the ones involved in navigation or those involved in the construction of memories for recognition, thus making the underlying representations purposive or teleological. Another class of problems we are currently examining are the ones of direct self-calibration (calibrating a camera from motion sequences without point correspondences) and motion segmentation.