With regard to technology, we currently concentrate on the following problems:
New camera technologies: The origin of our studies here is the original motion problem we have been discussing, and particularly our desire to build very accurate models of the world by visual means. We discovered that there exist inherent ambiguities in recovering 3D motion from image sequences taken by a camera with a restricted field of view. These ambiguities disappear for a 360 degree field of view. Such geometric studies led us to the realization that there exists a wide variety of eye designs in the biological world. By changing the surface where the photoreceptors are distributed as well as that distribution, we obtain different eyes of cameras. Different cameras acquire images differently and they may make subsequent processing with regard to a task harder or easier. By reducing visual tasks to geometric parameters of the world that need to be recovered, we can study in a mathematical and computational manner the power of different eyes or cameras. For example, an eye like the one in this figure, consisting of many video cameras arranged on a sphere and capable of simultaneous recording, is optimal for recovering models of the world. Other constructions are good for acquiring 3D video.
Computational video: Given video data, by applying algorithms based on the principles described before, we are capable of either inserting new objects into the video, or deleting/changing the video's content. This is an area at the intersection of vision and graphics, or, rather, an area dealing with the analysis and synthesis of visual data.
Distributed sensor networks: Consider a (large) network of sensors distributed across some area. With the appropriate software, these sensors serve as sensor agents, integrating information over space and time. Then, dispatcher agents send surveillance requests to the sensor agents and monitor the network's activity and the interpretation of the perceptual analysis. Interesting issues about distributed computing emerge.
Web technologies (video indexing): Our work here concentrates on video
grammars. The elementary part of a video is a shot. Shots are combined
to tell a story or display a scene. Each shot is composed of several
frames or images. Video representations from a natural hierarchy
paralleling the video's structure and extending from low-level
frame-based ones such as color histograms or texture patterns, to
middle level such as 3D motion and structure contained in shots, to
high-level ones such as scene representations -- mosaics containing
three-dimensional information -- or actions. Once we have the technology
for generating space-time and action descriptions given a video
segment, we can write grammars describing different high-level aspects
of a video and use the grammars and standard parsing technology to
search for higher-level information. Appropriate grammars for such
tasks are attributed stochastic grammars, which are very closely
related to Hidden Markov models. Thus, analysis of the video amounts
to obtaining the most likely parsings or the most likely values of
Hidden Markov nodes.
With regard to biology, we currently concentrate on the following
problems:
Eye design: This area of research is closely related to new camera
technologies. The success of an eye design should not be judged in an
anthropicanic sense, but in a rather purposive manner. Our effort
amounts to relating eye design to tasks that systems perform in a
mathematical manner. We have focused on the compound eyes of insects
and crustaceans, which are considered by many an evolutionary
accident. Our analysis demonstrates that they are optimal for many
tasks.
The motion pathway: Current thinking about the motion pathway's
structure and function is guided by the hypothesis that this neural
substrate is implementing an algorithm which is responsible for
estimating flow or correspondence and subsequently interpreting this
estimate. Our geometric studies indicate that this could not be
possible. A number of illusions that we have discovered, and other
illusions that we have explained point to the existence of another
mechanism that processes image sequences. Furthermore, specific
hypotheses arise that are testable by experimental analysis.
Language, thought and consciousness: Our basic thesis is that the
content of the mind is organized in the form of a model of the
external world, i.e., in the form of representations of space and
space-time. This viewpoint provides new insights for language and thought.
Return to the front page
Revised 1999/04/15
Send questions about these Web pages to