Activity Analysis and Action Recognition

 

From videos to verbs: Mining Videos for Events using a cascade of dynamical systems

Clustering video sequences in order to infer and extract events from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into events requires one to simultaneously recognize event boundaries (event consistent subsequences) and cluster these event subsequences. In order to do this, we build a generative model for events (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of events. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an event. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments are shown both for far field and near field activity videos. The clusters found by the algorithm are shown to correspond to semantically meaningful events in both scenarios.

Pavan Turaga, Ashok Veeraraghavan and Rama Chellappa. Mining Videos for Events using a cascade of dynamical systems: From videos to verbs, Accepted for Presentation at IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2007. [pdf] [Video] [ppt] [Project Page] NEW

 


Learning the Space of time-warping functions for Activity Recognition- The Function Space of An Activity

An activity consists of an actor performing a series of actions in a pre-defined temporal order. An action is an individual atomic unit of an activity. Different instances of the same activity may consist of varying relative speeds at which the various actions are executed, in addition to other intra- and inter- person variabilities. Most existing algorithms for activity recognition are not very robust to intra- and inter-personal changes of the same activity, and are extremely sensitive to warping of the temporal axis due to variations in speed profile. In this paper, we provide a systematic approach to learn the nature of such time warps while simultaneously allowing for the variations in descriptors for actions. For each activity we learn an ‘average’ sequence that we denote as the nominal activity trajectory. We also learn a function space of time warpings for each activity separately. The model can be used to learn individual specific warping patterns so that it may also be used for activity based person identification. The proposed model leads us to algorithms for learning a model for each activity, clustering activity sequences, activity recognition etc. We provide experimental results using two datasets.

Ashok Veeraraghavan, Rama Chellappa and Amit K. Roy-Chowdhury. The Function Space of an Activity , Oral Presentation at IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2006. [pdf] [ppt]

 

Matching Shape Sequences in Video

A sequence of deforming shapes has been used as an important cue for deformable object recognition and classification. We present a framework to compare two sequences of deforming shapes using both parametric models and non-parametric methods. In our approach, Kendall's definition of shape is used as the shape feature. Since the shape feature lives on a non-Euclidean manifold, we propose parametric models like the autoregressive model and the autoregressive moving average model on the tangent space and demonstrate the ability of these models to capture the nature of shape deformations by performing experiments on gait recognition.We also provide results for synthesis of deforming shapes using the parametric model learnt. The non-parametric model is based on Dynamic Time-Warping.
We suggest a modification of the Dynamic time-warping algorithm to include the nature of the non-Euclidean space in which the shape deformations take place. We also show the efficacy of this algorithm by its application to gait recognition. We consider the shape deformations of a person's silhouette as a discriminating feature and provide recognition results using the non-parametric model. Our analysis leads to some interesting observations on the role of shape and kinematics in automated gait recognition.

Ashok Veeraraghavan, Amit Roy Chowdhury and Rama Chellappa. Matching Shape Sequences in Video with an application to Human Movement Analysis. Accepted For Publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). [pdf] [ppt]

 

Shape and Behavior Encoded Tracking of Bee Dances or

Simulataneous Tracking and Behavior Recognition

Behavior analysis of social insects has garnered impetus in recent years and has led to some advances in fields like control systems, flight navigation etc. Manual labeling of insect motions required for analyzing the behaviors of insects requires significant investment of time and effort. In this paper, we propose certain general principles that help in simultaneous automatic tracking and behavior analysis with applications in tracking bees and recognizing specific behaviors exhibited by them. The state space for tracking is defined using position, orientation and the current behavior of the insect being tracked. The position and orientation are parametrized using a shape model while the behavior is explicitly modeled using a three-tier hierarchical motion model. The first tier (dynamics) models the local motions exhibited and the models built in this tier act as a vocabulary for behavior modeling. The second tier is a Markov motion model built on top of the local motion vocabulary which serves as the behavior model. The third tier of the hierarchy models the switching between behaviors and this is also modeled as a Markov model. We address issues in learning the three-tier behavioral model, in discriminating between models, detecting and in modeling abnormal behaviors. Another important aspect of this work is that it leads to joint tracking and behavior analysis instead of the traditional track and then recognize approach. We apply these principles for trackingbees in a hive while they are executing the waggle dance and the round dance.


Ashok Veeraraghavan, Rama Chellappa and Mandyam Srinivasan. "Shape and Behavior Encoded Tracking of Bee Dances" Accepted for Publication in IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI).[pdf]

 

Role Of Shape and Kinematics in Human Movement Analysis

Human gait and activity analysis from video is presently attracting a lot of attention in the computer vision community. In this paper, we analyze the role of two of the most important cues in human motion- shape and kinematics. We present an experimental framework whereby it is possible to evaluate the relative importance of these two cues in computer vision based recognition algorithms. In the process, we propose a new gait recognition algorithm by computing the distance between two sequences of shapes that lie on a spherical manifold. In our experiments, shape is represented using Kendall's definition of shape. Kinematics is represented using a Linear Dynamical system. We place particular emphasis on human gait. Our conclusions show that shape plays a role which is more significant than kinematics in human identification using gait. As a natural extension we study the role of shape and kinematics in activity recognition. Our experiments indicate that we require models that contain both shape and kinematics in order to perform accurate activity classification. These conclusions also allow us to explain the relative performance of many existing methods in computer-based human activity modeling.

Ashok Veeraraghavan, Amit Roy Chowdhury and Rama Chellappa. Role of Shape and Kinematics in Human Movement Analysis, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2004. [pdf] [ppt]