Larry S. Davis


Larry Davis' research on understanding human activity is supported by DARPA, ONR, Microsoft Corporation, Philips Research Laboratory, ATR's Media Integration Laboratory, and the Keck Foundation. The research focuses on new computer vision algorithms and systems that can detect, track and analyze human movement. Much of this research is carried out in the newly formed Keck Laboratory for the Analysis of Visual Movement. The Keck Lab (see Figure 1) contains 64 digital, progressive-scan monochromatic and color cameras connected to a network of PC's. The cameras are capable of acquiring images at rates of up to 85 frames per second. The PC's can collect up to ten seconds of uncompressed video from the 64 cameras for off-line analysis, or can be used for real-time systems development.

Figure 1. Keck Laboratory architecture.

W4: W4 is a real-time PC-based visual surveillance system that operates on both single images and stereo image pairs, and on visible as well as infrared imagery. It includes real-time vision algorithms for


W4 has been successfully applied to hours of monochromatic video, and can detect and track people against complex backgrounds at speeds of up to 30 frames per second. Recent extensions to W4 include the ability to analyze people in a variety of natural postures (the original version was restricted to upright people), to track individuals within a moving group, and to recognize that people are carrying or exchanging objects. Figure 2 shows one frame from a visible image sequence processed by the W4 system. Here, W4 is tracking three people who are passing through its field of regard. It has classified them as people based on their dynamic shapes and motions, and has built models of their appearance that allows it to track them through occlusions, and to recognize when a person leaves the field of view and then later returns. Based on its models of human form and motion, it has identified the locations of the principal body parts, and can track these parts through the sequence. Finally, it has approximately placed these people onto the ground plane via a simple and automatic calibration procedure.

Figure 2. Tracking of three people by W4.

We have also developed a version of W4 that employs stereo cameras, and combines its single camera intensity-based analysis with depth analysis from stereo. This version of W4 integrates stereo and intensity during its detection phase to eliminate shadows and to accurately segment the person from the background. More recently, we have extended W4's segmentation algorithm to color imagery.

W4 has also been extended to allow it to detect and track small groups of people, and to count the number of people in a group. This extension, called Hydra, is illustrated in Figure 3, where we show several frames from visible image sequences containing groups of 2­4 people walking together. The color-coding is a probability map showing to which person each foreground pixel belongs.

Figure 3. Segmenting a group of people into individuals using Hydra.

W4 can control an active camera pan/tilt/zoom camera to conduct surveillance over a wide field of regard, and to zoom in on moving objects so they can be classified and tracked. Figure 4 illustrates this, where we display several of the fields of view employed by the system as it first detects a person leaving a building, and then tracks that person over a very long distance through control of the pan/tilt/zoom of the active camera.

Figure 4. Active tracking of a person over a long distance.

Measuring Periodicity of Human Movement:

Periodicity analysis is useful both for detecting people and for characterizing their movements. We have developed a new and robust algorithm for measuring the periodicity of the appearance of an object from video, and have applied it to recognizing moving objects such as people in video sequences taken from both stationary and moving cameras. The approach is based on spectral analysis of the cross-correlations of images of the object taken during at least one cycle of its periodic motion. Figure 5 shows one frame in a video sequence of a person walking across a parking lot. A spectral analysis of this correlogram reveals a structure highly indicative of human walking or running.

(a) (b)

Figure 5. Periodicity analysis of human movement. (a) Image. (b) Spectral analysis.

Shall We Dance: Shall We Dance was a real-time motion capture demo presented at SIGGRAPH 98 in collaboration with ATR's Media Integration Laboratory and M.I.T.'s Media Laboratory. Shall We Dance uses several of the component algorithms of W4 and incorporates them into a 3D body part tracking system. The operation of Shall We Dance is illustrated in Figure 6 (the graphical characters are reproduced with the permission of ATR's Media Integration Laboratory). Six cameras view a person moving freely. Each camera runs a version of the W4 system, and tracks the positions of the person's head, hands, torso and feet with the assistance of predictions of their positions provided by the controller. The 2D positions are integrated through stereo analysis and models of human movement (developed at M.I.T.) to estimate their locations and motions in 3D. The 3D motion estimates are then used to predict the locations of the body parts in the next set of frames acquired by the cameras. Graphics algorithms from ATR are used to animate graphical characters designed by ATR to illustrate the accuracy of the motion capture. The system operated at speeds of up to 25 frames per seconds; hundreds of people entered the demonstration area and controlled the movements of the graphical characters through their own motions.

Figure 6a. Shall We Dance: Architecture.

Figure 6b. Shall We Dance: Example.

Detecting Pedestrians: In conjunction with Daimler-Benz Research in Ulm, Germany, we have developed a template-based approach to detecting pedestrians in images. The system is based on constructing hierarchical template models based on thousands of instances of people in different poses. It compares these hierarchical template structures to structured edge maps of images using distance transform techniques such as chamfer matching and Hausdorff matching. It operates at speeds of about two frames per second on a PC and can be applied to either visible or infrared imagery (Figure 7).

Figure 7. Pedestrian detection.

Appearance Models for Human Action: We have been studying how objects with complex time-varying geometries, such as moving people, can be accurately tracked based on learned models of their typical motions. We have been developing an appearance-based approach, in which compact models of evolving parametric flow fields observed from generic viewpoints are learned from experience. These models are then used to track subsequent instances of these movements, or to recognize which of a variety of movements is being observed (based on a goodness-of-tracking criterion). In our most recent work we show how these learned models of movement can be used for tracking even when the camera observing the movement is itself moving. This involves decoupling the motion due to the rigid movement of the camera from the learned movement of the object.


[Up][Top][Search]

Please mail questions/comments to
webmaster@cfar.umd.edu

November 1999