Next:
5 Experimental Results and
Up:
Visual Tracking of Solid
Previous:
3 Solid Objects
The object pose is inferred by aligning the geometric model of an object to the estimated silhouette in the images (1D set of B-spline control points) given an estimated pose. The initial object pose in the first frame is assumed to be known, enabling the computation of the silhouette and initializing the contour tracker. Afterwards, the silhouette is tracked through the image sequence and repeated pose estimation is used to update the state of the 3D pose tracker.
Before we formulate the problem, we have to define what we henceforth
understand under the term ``object''. Objects are connected, bounded
3-manifolds embedded in
. Let
be the set of all objects, then
together with the Hausdorff metric
is a metric space. The group
G
of 3D Euclidean transformations has six degrees of freedom and maps
onto
. The orbit of a single object
under
G
:
is a 6-dim manifold in
. It is parameterized by
, where (
x
,
y
,
z
) are the translation parameters and
are the rotation angles around the x, y, and z-axis
. Consider an object model
and a realization
, then the pose estimation problem can be stated as: Determine
, such that
:
The problem has not a unique solution for all objects, e.g. for a sphere
always exists a 3D manifold of solutions. Let
be the weak perspective image of the object
. Then the optimization problem (
1
) can be reformulated:
where
is defined on
, obeying the condition:
stating that if the silhouettes are equal, the objects have to be equal,
too. For tracking tasks the condition can be restricted to an (open)
neighborhood
of an object
, since an estimate
of the object pose is available. The condition becomes:
exists an open neighborhood
such that
:
Under this condition the function
exhibits a global minimum at the point
, where
is the solution to the problem (
2
).
can be determined by searching for the global minimum.
It remains to define the metric
, but instead of choosing the Hausdorff metric on
,
is defined as the symmetric area difference between the graphs of
and
:
Since the measured as well as the estimated silhouettes computed from
the 3D object model are represented as B-splines, the area difference of
the graphs can be efficiently computed by intersecting the splines and
symbolically integrating along the curves (using the Gauss Theorem for
integration over curves in
).
As already pointed out, the silhouette of an object cannot be expressed
as an analytical function of the 3D object model. Hence, there is no
closed form solution to the optimization problem (
2
). The optimization is executed in subgroups
of the transformation group
G
, where the silhouettes have to be represented invariant with respect to
the unknown parameters
. The B-splines can be represented translation and
-rotation invariant by computing the first and second order moments
. To achieve invariance, the B-splines are translated to align their
center of gravity with the coordinate origin, scaled to unit area and
rotated, such that the mixed second moment vanishes.
At first the parameters
and
are computed by determining the function
at discrete points in the open neighborhood of the predicted pose
, where
is the graph of the normalized measured silhouette and
denotes the graph of the normalized silhouette of the object at
.
is computed on a regular
grid with center
.
The global minimum of
can be computed by fitting a quadric to
,
. The coefficients
of the quadric are computed by minimizing the square error difference,
setting up the normal equation. The inverse of the normal equation
matrix can be computed off-line, if the grid is previously translated to
the origin
. The position of the minimum
can be extracted from
by partially differentiating the quadric and setting the gradient to
zero.
The third angle
can be computed using
by again fitting a quadratic function to the symmetric area differences
measured at discrete points in the neighborhood of
, or by simply taking the differences in the normalization angles
computed from the second order moments, which leads to ambiguities
solvable by comparing with the predicted
. The translation parameters
can also be recovered from the normalization parameters.
The object pose derived from the silhouette is likely to be erroneous.
For pose smoothing and prediction a Kalman filter is derived with the
state vector
, where
represents the linear and angular velocity of the parameters
g
, respectively.
After an aspect change occurred, new control points may have to be introduced in the contour tracker to model corners or curved parts of the silhouette invisible in the previous aspect. As a consequence, a contour state estimate has to be derived, including an estimate of the control point velocity. These image velocities can be estimated from the 3D object pose velocity.
At first aspect changes which influence the shape of the silhouette are predicted to reinitialize the contour tracker. An aspect change is determined by certain visual events (see Koenderink et al. [ 8 ] for a complete list of visual events). For smooth objects aspect changes influencing the silhouette are restricted to T-junction events, which can be discovered by comparing the topological structure of T-junctions of different silhouettes. Furthermore, if the difference between the currently utilized template in the contour tracker and the predicted silhouette exceeds a certain threshold, the template has to be replaced by a new (set of) silhouettes.