Some Results

3D motion: The best way to test algorithms on 3D motion estimation is with a hand-held camera. This way the motion changes continually and smoothing or other regularization procedures cannot be used, so one can only use the information in the successive frames. Video 1 shows a sequence captured in the lab. The sequence has a large number of discontinuities, near and far objects, and a rich variety of surface structure. Video 2 shows the solution for the epipole or Focus of Expansion (the place where the translation vector pierces the image plane). The green dot is the solution from our algorithm and the hollow yellow dot is the solution provided by correspondence-based epipolar minimization. Video 3 shows the inverse depth map generated by our solution, with a color code (mid-gray corresponds to places where no information is available). An important aspect of our approach is based on the concept of distortion. If the wrong 3D motion is recovered and used to find the depth of the scene, then the wrong depth will be recovered, or, as we say, a distorted depth will emerge. This distortion has interesting properties. Notice in Video 4 the inverse depth map generated by an incorrect 3D motion, and note the high variability of depth in many places. Even negative depth values are produced (black). Our solution utilizes this property. Video 5 shows how well the rotation is estimated. It does so by subtracting the rotation from the original sequence so that the remaining video represents a sequence containing only translation (shown at right).

3D shape: Perhaps the most defining test of how well 3D motion is recovered is the estimation of shape. The reason for this is that many tasks can be achieved with somewhat or slightly incorrect 3D motion, but an error of a few pixels (for example, in the location of the translation direction) is enough to create significant errors in the estimated shape. How well shape models can be estimated depends on a number of factors besides accurate 3D motion estimates, such as the number of frames utilized (amount of data) and the actual representation of the model. For synthetic data the results are close to perfect. For example, for the well known Yosemite sequence, the extracted shape is shown here in the form of a mesh, and here with painted texture. Only one normal flow field was used. Video 6 shows a model for the scene described before recovered from a few frames and without any elaborate data structures; the scene is simply a set of 3D points. A bit more sophistication in representing the scene (triangles) results in much better models in our approach. See, for example, this reconstruction obtained from one flow field. This sequence shows the parts of the recovered model not containing discontinuities. More frames also result in better models. Video 7 shows an original sequence and Video 8 the obtained reconstruction. Videos 9 and 10 show another example. No post-processing was performed here but, clearly, graphics post-processing further improves the results. Finally, consider a reconstruction from multiple videos (Video 11, Video 12, Video 13). Video 14 shows the recovery, almost perfect. Again, no post-processing was performed.

Motion segmentation: This is the hardest problem in dynamic scene analysis; our approach was conceived with this problem in mind. Video 15 shows an original, well known sequence. An elaborate optimization scheme with feedback starts from the normal flow values and builds representations of camera motion and localizations of motion and background boundaries. The principle of depth variability plays a central role. Video 16 shows recovered inverse depth for a part of the sequence with the gray-level value showing the amount of depth (white denoting large positive values, i.e., close to the camera, and black denoting negative values). Notice the high variability of depth at the locations of independent movement. Also, notice that, at times, the train motion is consistent with the camera motion (making independent motion detection difficult) so no high variability of depth is obtained, but the depth comes out negative, marking independent motion. Notice in Video 17 the depth variability measurements for the same part of the sequence (white denoting large values). The procedure searches for the camera motion and the motion boundaries. Depth variability is the basis for the solution.

Return to the front page

Revised 1999/06/15
Send questions about these Web pages to

larson@cfar.umd.edu