My Research
|
|||||
Research Topics:
|
|||||
Manifold Models for Human Motion Analysis | |||||
Modeling View and Posture Manifolds | |||||
We
consider modeling data lying on multiple continuous manifolds. In
particular, we model shape manifold of a person performing a motion observed
from different view points along a view circle at fixed camera height. We
introduce a model that ties together the body configuration (kinematics)
manifold and the visual manifold (observations) in a way that facilitates
tracking the 3D configuration with continues relative view variability. The
model exploits the low dimensionality nature of both the body configuration
manifold and the view manifold where each of them are represented
separately. |
|||||
Tracking People on a Torus | |||||
Suppose
we want to model the visual patterns of a periodic articulated motion (such
as walking) observed from any view point. Such visual patterns lie on a
product space (different body configuration X different views). We showed
that a topology preserving setting is suitable to model certain human
motions which lie intrinsically on one-dimensional manifolds, whether closed
and periodic (such as walking, jogging, running, etc.) or open (such as golf
swing, kicking, tennis serve, etc.) We showed that we can represent the
visual manifold of such motions (in terms of shape) as observed from
different view points by mapping such data to a torus manifold (for
the case of a single view circle) or family of tori (for the whole
view sphere). The approach we introduced is based on learning the visual
observation manifold in a supervised manner. Traditional manifold learning
approaches are unsupervised where the goal is to find a low dimensional
embedding of the data. However, if the manifold topology is known, manifold
learning can be formulated as learning a mapping from/to a topological
structure to/from the data where that topological structure is homeomorphic
to manifold of the data. |
|||||
Multi-factor Models for Style Separation | |||||
We show
how to model several ``style'' factors using multilinear analysis in the
space of nonlinear basis functions. This way we can separate different
sources of style variations of the same motions. For example, the model can
be used to render data for different people's walking figures from different
views; or different faces performing different facial expressions. On the
other hand, given an input pattern, we introduced an optimization process
that can solve for the different factors which produced that pattern. For
example, from a single shape instance we can recover the body configuration,
the view point, and the person's shape identity. Similarly, from a single
face image, we can recover the facial expression, the face identity, and the
motion phase. |
|||||
Separating Style and Content on a Nonlinear Manifold | |||||
Bilinear and multi-linear models have been successful in decomposing static image ensembles into perceptually orthogonal sources of variations, e.g., separation of style and content. If we consider the appearance of human motion such as gait, facial expression and gesturing, most of such activities result in nonlinear manifolds in the image space. The question that we address in this research is how to separate style and content on manifolds representing dynamic objects. We learn a decomposable generative model that explicitly decomposes the intrinsic body configuration (content) as a function of time from the appearance (style) of the person performing the action as time-invariant parameter. The framework is based on decomposing the style parameters in the space of nonlinear functions which map between a learned unified nonlinear embedding of multiple content manifolds and the visual input space. | |||||
Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning | |||||
We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration (pose) from the visual input (silhouette), we explicitly learn view-based representations of activity manifolds as well as learn mapping functions between such central representations and both the visual input space and the 3D body pose space. The body pose can be recovered in a closed form in two steps by projecting the visual input to the learned representations of the activity manifold, i.e., finding the point on the learned manifold representation corresponding to the visual input, followed by interpolating 3D pose. | |||||
Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance | |||||
Our objective is to learn representations for the shape and the appearance of moving (dynamic) objects that supports tasks such as synthesis, pose recovery, reconstruction and tracking. We introduce a framework that aim to learn a landmark-free correspondence-free global representations of dynamic appearance manifolds. We use nonlinear dimensionality reduction to achieve an embedding of the global deformation manifold that preserves the geometric structure of the manifold. Given such embedding, a nonlinear mapping is learned from the embedding space into the visual input space. Therefore, any visual input is represented by a linear combination of nonlinear bases functions centered along the manifold in the embedding space. We also show how approximate solution for the inverse mapping can be obtained in a closed form which facilitate recovery of the intrinsic body configuration. We use the framework to learn the gait manifold as an example of a dynamic shape manifold, as well as to learn the manifolds for some simple gestures and facial expressions as examples of dynamic appearance manifolds. | |||||
Bilinear and Multilinear Models for Gait Recognition | |||||
Human Identification using gait is a challenging computer vision task due to the dynamic motion of gait and the existence of various sources of variations such as viewpoint, walking surface, clothing, etc. In this research we investigate gait recognition algorithms based on bilinear and multilinear decomposition of gait data into time-invariant gait-style and time-dependent gait-content factors. We developed a generative model by embedding gait sequences into a unit circle and learning nonlinear mapping which facilitates synthesis of temporally-aligned gait sequences. Given such synthesized gait data, bilinear model is used to separate invariant gait style which is used for recognition. We also show that the recognition can be generalized to new situations by adapting the gait-content factor to the new condition and therefore obtain corrected gait-styles for recognition. | |||||
Exemplar-based Tracking and Gesture Recognition - nonparametric HMMs | |||||
In this research we addresses the problem of capturing the dynamics for exemplar-based recognition systems. Traditional HMM provides a probabilistic tool to capture system dynamics and in exemplar paradigm, HMM states are typically coupled with the exemplars. Alternatively, we propose a non-parametric HMM approach that uses a discrete HMM with arbitrary states (decoupled from exemplars) to capture the dynamics over a large exemplar space where a nonparametric estimation approach is used to model the exemplar distribution. This reduces the need for lengthy and non-optimal training of the HMM observation model. We used the proposed approach for view-based recognition of gestures. The approach is based on representing each gesture as a sequence of learned body poses (exemplars). The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constraints between different poses using the proposed non-parametric HMM. | |||||
Tracking: | |||||
Learning to Track | |||||
Tracking
is typically posed as a search problem in a geometric transformation
parameter space as well as in the object's configuration parameter space.
Generally, tracking is based on learning an invariant representation of the
tracked object, then searching the parameter space for the best fit. The goal of this research is to achieve trackers that can directly ``infer'' such parameters from the object appearance through learned models of the visual manifolds of such parameter spaces. We show how to learn a representation of the appearance manifold of an object, given a class of geometric transformation. We learn a generative model for object appearance where the appearance of the object at each new frame is an invertible function that maps from a representation of the geometric transformation space into the visual space. By learning such generative model we can infer the geometric transformation (track) directly from the tracked object appearance. As a result tracking can be achieved in a closed-form and therefore can be done very efficiently. The novelty of this work is that it showed how learning the appearance manifold of an object can play a role to achieve efficient tracking. |
|||||
Appearance-Based Generalized Kernel Tracking | |||||
We exploit the feature-spatial distribution of a region representing an object as a probabilistic constraint to track that region over time. The tracking is achieved by maximizing a similarity-based objective function over transformation space given a nonparametric representation of the joint feature-spatial distribution. Such a representation imposes a probabilistic constraint on the region feature distribution coupled with the region structure which yields an appearance tracker that is robust to small local deformations and partial occlusion. We presented the approach for the general form of joint feature-spatial distributions and apply it to tracking with different types of image features including row intensity, color and image gradient. | |||||
Tracking Multiple People | |||||
In this research we address the problem of segmenting foreground regions corresponding to a group of people given models of their appearance that were initialized before occlusion. We present a general framework that uses maximum likelihood estimation to estimate the best arrangement for people in terms of 2D translation that yields a segmentation for the foreground region. Given the segmentation result we conduct occlusion reasoning to recover relative depth information and we show how to utilize this depth information in the same segmentation framework. We also present a more practical solution for the segmentation problem that is online to avoid searching an exponential space of hypothesis. The person model is based on segmenting the body into regions in order to spatially localize the color features corresponding to the way people are dressed. Modeling these regions involves modeling their appearance (color distributions) as well as their spatial distribution with respect to the body. We use a non-parametric approach based on kernel density estimation to represent the color distribution of each region and therefore we do not restrict the clothing to be of uniform color. Instead, it can be any mixture of colors and/or patterns. We also present a method to automatically initialize these models and learn them before the occlusion. | |||||
Scene Modeling and Background Subtraction | |||||
Feature Selection for Background Subtraction - Boosted Background Model | |||||
Various statistical approaches have been proposed
for modeling a given scene background. However, there is no theoretical
framework for choosing which features to use to model different regions of
the scene background. In research paper we introduce a novel framework for
feature selection for background modeling and subtraction. A oosting
algorithm, namely RealBoost, is used to choose the best combination of
features at each pixel. Given the probability estimates from a pool of
features calculated by Kernel Density Estimate (KDE) over a certain time
period, the algorithm selects the most useful ones to discriminate
foreground objects from the scene background. The results show that the
proposed framework successfully selects appropriate features for different
parts of the image.
|
|||||
Nonparametric Model for Background Subtraction | |||||
In video
surveillance systems, stationary cameras are typically used to monitor activities at
outdoor or indoor sites. Since the cameras are stationary, the detection of moving objects
can be achieved by comparing each new frame with a representation of the scene background.
This process is called background subtraction and the scene representation is called the
background model. Typically, background subtraction forms the first stage in automated
visual surveillance systems. Results from background subtraction are used for further
processing, such as tracking targets and understanding events. We introduced a novel background model and a background subtraction technique based on statistical nonparametric modeling of pixel process. The model keeps a sample of intensity values for each pixel in the image and uses this sample to estimate the probability density function of the pixel intensity. The density function is estimated using kernel density estimation technique. Since this approach is quite general, the model can approximate any distribution for the pixel intensity without any assumptions about the underlying distribution shape. The model can handle situations where the background of the scene is cluttered and not completely static but contains small motions that are due to moving tree branches and bushes. The model is updated continuously and therefore adapts to changes in the scene background. The approach runs in real-time. Code is available per request
|
|||||
Efficient Kernel Density Estimation using Fast Gauss Transform | |||||
Many vision algorithms depend on the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this research we explore the use of kernel density estimation with the fast Gauss transform (FGT) for problems in vision. The FGT allows the summation of a mixture of M Gaussians at N evaluation points in O(M+N) time as opposed to O(MN) time for a naive evaluation, and can be used to considerably speed up kernel density estimation. We present applications of the technique to problems from image segmentation and tracking, and show that the algorithm allows application of advanced statistical techniques to solve practical vision problems in real time with today's computers. |