The Applications

There are two major classes of application for this research, related to technology and biology.

With regard to technology, we currently concentrate on the following problems:

New camera technologies: The origin of our studies here is the original motion problem we have been discussing, and particularly our desire to build very accurate models of the world by visual means. We discovered that there exist inherent ambiguities in recovering 3D motion from image sequences taken by a camera with a restricted field of view. These ambiguities disappear for a 360 degree field of view. Such geometric studies led us to the realization that there exists a wide variety of eye designs in the biological world. By changing the surface where the photoreceptors are distributed as well as that distribution, we obtain different eyes of cameras. Different cameras acquire images differently and they may make subsequent processing with regard to a task harder or easier. By reducing visual tasks to geometric parameters of the world that need to be recovered, we can study in a mathematical and computational manner the power of different eyes or cameras. For example, an eye like the one in this figure, consisting of many video cameras arranged on a sphere and capable of simultaneous recording, is optimal for recovering models of the world. Other constructions are good for acquiring 3D video.

Computational video: Given video data, by applying algorithms based on the principles described before, we are capable of either inserting new objects into the video, or deleting/changing the video's content. This is an area at the intersection of vision and graphics, or, rather, an area dealing with the analysis and synthesis of visual data.

Distributed sensor networks: Consider a (large) network of sensors distributed across some area. With the appropriate software, these sensors serve as sensor agents, integrating information over space and time. Then, dispatcher agents send surveillance requests to the sensor agents and monitor the network's activity and the interpretation of the perceptual analysis. Interesting issues about distributed computing emerge.

Web technologies (video indexing): Our work here concentrates on video grammars. The elementary part of a video is a shot. Shots are combined to tell a story or display a scene. Each shot is composed of several frames or images. Video representations from a natural hierarchy paralleling the video's structure and extending from low-level frame-based ones such as color histograms or texture patterns, to middle level such as 3D motion and structure contained in shots, to high-level ones such as scene representations -- mosaics containing three-dimensional information -- or actions. Once we have the technology for generating space-time and action descriptions given a video segment, we can write grammars describing different high-level aspects of a video and use the grammars and standard parsing technology to search for higher-level information. Appropriate grammars for such tasks are attributed stochastic grammars, which are very closely related to Hidden Markov models. Thus, analysis of the video amounts to obtaining the most likely parsings or the most likely values of Hidden Markov nodes.

With regard to biology, we currently concentrate on the following problems:

Eye design: This area of research is closely related to new camera technologies. The success of an eye design should not be judged in an anthropicanic sense, but in a rather purposive manner. Our effort amounts to relating eye design to tasks that systems perform in a mathematical manner. We have focused on the compound eyes of insects and crustaceans, which are considered by many an evolutionary accident. Our analysis demonstrates that they are optimal for many tasks.

The motion pathway: Current thinking about the motion pathway's structure and function is guided by the hypothesis that this neural substrate is implementing an algorithm which is responsible for estimating flow or correspondence and subsequently interpreting this estimate. Our geometric studies indicate that this could not be possible. A number of illusions that we have discovered, and other illusions that we have explained point to the existence of another mechanism that processes image sequences. Furthermore, specific hypotheses arise that are testable by experimental analysis.

Language, thought and consciousness: Our basic thesis is that the content of the mind is organized in the form of a model of the external world, i.e., in the form of representations of space and space-time. This viewpoint provides new insights for language and thought.

Return to the front page

Revised 1999/04/15
Send questions about these Web pages to