Automatic gesture recognition consists, at this point, in finding the
model that best fits a given image sequence. This implies estimating the
following parameter set: the transition matrix
A
, the posture models collection
C
, the variance matrix
and the state dimension
N
. It is also necessary to estimate temporal informations like the number
of self-transitions of each state, and the order of the transitions
between canonical postures.
The identification procedure is based on the Expectation-Maximization
(EM) algorithm [
1
]. It computes the update of the model parameters and it estimates some
auxiliary quantities, such as the number of jumps
from state
r
to state
s
up to time
k
and the occupation time
of state
r
up to time
k
.
The convergence of the EM algorithm is guaranteed by Jensen inequality [ 1 ]. The generated sequence of the estimates of the parameters correspond to nondecreasing values of an appropriate likehood function. The learning process can, then, be terminated when the likehood either reaches a certain threshold level or does not increase any more.
Let's consider the simple example of a hand gesture shown in Figure (6). It consists of repeated openings and closures of the hand.
![]() |
![]() |
![]() |
Adrian F Clark