Next: Conclusion Up: Vehicle Tracking with Applications Previous: Motion Estimation

Visual Traffic Understanding

We have previously demonstrated a system for recognising and tracking vehicles in complex traffic scenes, using model-based methods [ 11 ]. The system relies on the ability to evaluate a ``pose hypothesis'' according to context-dependent evidence, which is aggregated to form a scalar fitness score.

We make a pose hypothesis and project a model of a vehicle into the image, in the form of a wire frame drawing with hidden line removal. Evidence in support of the hypothesis is obtained from image derivatives perpendicular to the ``wires''. The normal to a line (in the image) defines a direction in which high values of image derivatives would provide evidence for the line. Points of local derivative maxima are found in the direction of normals spaced along the line. The strengths of the derivative maxima along the projected model line are pooled to obtain an evaluation score for the line. The probability of obtaining the evaluation score is estimated using tables previously constructed for the scene using Monte Carlo techniques.

Experimental Results

Initial experiments have been performed on 12 seconds of video taken from a camera mounted in a car which is moving from the outside lane to a central lane of a four lane motorway. At the same time, another car ahead moves into the same central lane (see Figure 5 ). The camera was calibrated [ 15 ] using knowledge of the width of the lanes and separation between lane markings, and thereby the ``ground-plane constraint'' (GPC) can be imposed. Vehicle models are only allowed to move with three degrees of freedom namely translations ( X , Y ) and rotation ( ) with respect to a world coordinate system, in which the XY plane coincides with the ground. The minimisation of the evaluation function over X , Y , provides an effective and efficient way to obtain the optimum pose for a given vehicle model [ 11 ].

Hypothesis Generation

The optical flow vectors obtained using the approach described in Section: Independent Motion , were clustered into regions based on proximity. An approximation for the pose of the detected vehicles was obtained by projecting the centroids of bounding boxes of the detected moving regions from the image plane onto the ground plane. This image-based detection and analysis of movement provides an initial estimated pose (assuming the same travelling direction as the car-mounted camera) for each detected vehicle.

The model hypothesis for each region was obtained using the 3 1-D template technique described in [ 13 ] although other techniques such as [ 3 ] would also be appropriate. In practice, this requires discrimination between a number of different classes (e.g. saloon, small van, large van, lorry). Experiments indicate that reliable class determination is only possible for vehicles close to the camera. This is not surprising given the low resolution of the images.

Hypothesis Verification

The pose hypothesis is refined by carrying out a search for a local minimum of the evaluation function. Thus starting from a ``seed pose'', cued by movement analysis (i.e. optical flow) we obtain the best local pose. Subsequent to the identification of the type and pose of a vehicle, it is tracked through the sequence of video images, by means of simple dynamic filtering techniques.

Figure 5: 3-D vehicle model instantiations and corresponding interpreted scenes for frames 10, 109 and 275 of the motorway sequence. The separation between distance markers in the virtual ``eye in the sky'' view is 6m. The vehicle at the bottom of the figure contains the camera. The position of the lanes is not quite correct because no allowance has been made for the change in the sideways translation of the vehicle containing the camera.

In the example shown, in Figure 5 , appropriate 3-D models for the identified vehicles were projected into the the first frame of the image sequence. These initial hypotheses make model-based pose recovery possible using the simplex optimisation scheme. Figure 5 illustrates the results of tracking three vehicles through the image sequence using a standard Kalman filter. The camera is assumed to move forward with a constant speed of 25m sec^-1 . Figure 5 (top) depicts the vehicle instantiations in frames 10, 109 and 275 respectively of the motorway sequence, and Figure 5 (bottom) shows the view from a virtual `` eye in the sky '' camera moving with the vehicles. A ``virtual helicam'' view of the same motorway sequence is available here .

This analysis permits spatio-temporal reasoning based on a 3-D understanding of the scene; it is a simple task to answer 3-D questions relevant to collision alert , e.g. the proximity of the camera to vehicles in front. The model-based techniques explicitly recover the relative poses of the other vehicles. If necessary each vehicle pose can be estimated in a fixed (static) coordinate frame because the motion of the camera is estimated in parallel with the motions of the target vehicles. Note that the relative poses of the vehicles in Figure 5 are stable even though the vehicles subtend only a few pixels. Figure 6 illustrates (from bottom to top) the recovered pose ( X , Y and relative to the moving camera) for the saloon vehicle traversing lanes in Figure 5 .

Figure 6: The recovered pose (from bottom to top: X , Y , relative to the moving camera) for the saloon car traversing from the inside to the centre lanes of the motorway sequence example.

Figure 7 illustrates the recovered pose (Y) for the same saloon car with separate Kalman filtering of the pose parameters and egomotion parameters in a closed-loop implementation. The top graph shows the separation between the two cars as given by the Kalman filters and the lower graph shows the difference between this separation and the current measurement. The use of two separate filters is far from ideal and future work aims to use a single filter based on path estimation.

A potentially major source of noise, as mentioned earlier, is camera vibration. An experiment was performed in which the white parallel lines on the road surface in the original images were tracked in 3-D and used to recover the camera pan and tilt on a frame-to-frame basis. These measurements were then used to correct for the camera motion over the same image sequence. The results of model-based tracking, however, did not confer any real benefits.

Figure 7: Closed-loop Kalman filter tracking of egomotion, and motion (Y) of independent vehicle along the motorway. Top graph: separation of the two cars as given by the Kalman filters, and bottom graph: distance between this separation and the current measurement.

Next: Conclusion Up: Vehicle Tracking with Applications Previous: Motion Estimation

James Michael Ferryman
Fri Jul 18 17:59:39 BST 1997