Header image
Ph.D. Student  

Action Recognition

3D Shape Context and Distance Transform for Action Recognition
Matthias Grundmann, Franziska Meier and Irfan Essa
Presented at ICPR 2008

[ abstract + bibtex entry | paper ]

Our goal is to classify actions from video sequences. We use the Weizmann and KTH dataset. Both datasets contain videos that show persons perfoming different action like running, waiving or jumping. Some examples are shown below:

Samples from KTH datset
Samples from Weizmann dataset


We use a real-time foreground background substraction to extract the silhouettes of the actors. By stacking the silhouettes over time we obtain 3D Space-Time shapes that represent the action.

Space-Time shapes for actions bend and skip from Weizmann dataset

We represent an action by a 3D point cloud that is obtained by sampling the 3D Space-Time shape. We propose a new non-unifom sampling, that gives preference to fast moving body parts by using a smaller sample window for those parts.
Fast moving body parts have the property that their distance to the boundary changes rapidly over time. We propose to use the temporal derivate of a 3D Distance Transform of the Space-Time Shape as a measure to identify fast moving parts.

Uniform vs. motion adaptive sampling for action running.
Response functions for actions bend and skip. Red vaues indicate fast, blue values slow moving body parts.

To match actions based on their motion-adaptive sampled point clouds we extend the 2D Shape Context to 3D. Care has to be taken to ensure that every bin of the shape context covers the same surface area. we propose a non-uniform discretization of the latitude angle to achieve this. The difference is shown below:


Uniform discretization leads to smaller bins around the poles.
Equal surface discretization of the latitude angle.

Our motion adaptive sampling can successfully discriminate actions that exhibit a strong spatial similarity frame-wise) but differ along temporal dimension.

Confusion matrix for extended Weizmann dataset using motion adaptive sampling.
Confusion matrix for extended Weizmann dataset without motion adaptive sampling.