搜档网
当前位置:搜档网 › Head Pose Estimation Based on Head Tracking and the Kalman Filter

Head Pose Estimation Based on Head Tracking and the Kalman Filter

Head Pose Estimation Based on Head Tracking and the Kalman Filter
Head Pose Estimation Based on Head Tracking and the Kalman Filter

Physics Procedia 22 (2011) 420 – 427

1875-3892 ? 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of Garry Lee.

doi:10.1016/j.phpro.2011.11.066

2011 International Conference on Physics Science and Technology (ICPST 2011) Head Pose Estimation Based on Head Tracking and the

Kalman Filter

Wang Yu a, b , Liu Gang c *

a College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China

b Wuchang Shipbuilding Industry Co.,Ltd, Wuhan, China

c College of Computer Science & Technology, Wuhan University of Technology, Wuhan, China Abstract

In this paper, we propose a head pose estimation method which combines a texture-based head tracking method and the Kalman filter. The texture-based tracking method first estimates the head pose in the current frame by recovering the relative head motion between consequence frames. The Kalman filter predicts the head pose in the next frame, which can help the tracking method to recover motion from the predicted pose. Our method has been tested on a real video sequence. The experiment results show it successfully tracks a head and improve the efficiency of the tracking method.

Keywords: Head Pose Estimation; Tracking Method; Kalman Filter

* Wang Yu. Tel.:+86-027-********; fax:+86-027-********.

E-mail address

: yuwangpost@https://www.sodocs.net/doc/d812044084.html,.

Open access under CC BY -NC-ND license.? 2011 Published by Elsevier B.V . Selection and/or peer-review under responsibility of Garry Lee.

Open access under CC BY-NC-ND license.

Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427421 1.Introduction

In many computer vision applications, the 6 degree of freedom (DOF) [1] of a head is the key factor

for analyzing a person’s motion and intentions, evaluating his focus of attention, and reconstructing a 3-D

face. But estimating these 3D parameters automatically and robustly remains a challenging problem. An

obvious difficulty lies in calculating the rotation angles from a pixel-based representation of a head. Moreover, some factors such as varying illumination conditions, partial occlusions, and complex background, also impact the result of estimation.

In the last few years, a number of approaches have been proposed to solve the problem of head pose estimation. Some of them make use of stereoscopic cameras, which can provide the depth information of

an image. But stereoscopic cameras are too expensive for some applications, so people still focus on the monocular methods.

So far, many monocular methods have been applied for head pose estimation. Among them tracking methods achieve high level accuracies. These tracking methods estimate head pose by recovering full

motion of head between consecutive frames of a video sequence. But most tracking methods only think

about how to compute the 3D parameters of head motion by using gray or color information about two frames. In our opinion, it is helpful to take into account the previous state when recovering head motion.

For example, when we detect that a head is turning right at the current frame, it is likely that the head will

turn right at the next frame. Using this prior knowledge may increase the efficiency of estimation.

Thus, we propose a head pose estimation method combining a texture-based head tracking method and

the Kalman filter [2]. The texture-based tracking method is used to estimating the head pose in the current

frame. The Kalman filter predicts the head pose in the next frame by maximizing the posteriori probability of the head pose based on the previous estimations. Then the result of the Kalman filter prediction is used to improve the performance of the tracking method.

The reminder of the paper is organized as follows: the related works are briefly reviewed in section 2.

In section 3, the texture-based tracking method and the Kalman filter are discussed in details. In section 4, experiment results are provided to demonstrate performance of the proposed method.In section 5, conclusion and future work are given.

2.Related Work

Chutorian have classified current methods of head pose estimation into eight categories [1]. Among

them, manifold embedding methods, flexible model methods, and tracking methods have been widely investigated.

The manifold embedding methods [3] consider the high-dimensional head images as a set of geometrically related points lying on a smooth low-dimensional manifold. For head pose estimation, a manifold is first computed based on images with known head poses, and then a new head image is projected onto this manifold by an embedding technique to get the head pose. These methods can obtain satisfactory results. A difficulty of them is to obtain a regular sampling of poses from a lot of people

when training a manifold.

The flexible model methods use non-rigid models to fit the face and can successfully operate in many scenarios. Models such as ASM (Active Shape Model) [4] and AAM (Active Appearance Model) [5]

have been utilized in tracking face and facial features. To estimate 3D head pose, people often train 3D

face shape models. Chutorian [6] used a 3D facial model to track the motion of the driver’s head. Although these models can work efficiently, they are usually very complex and computationally expensive.

422 Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427

The tracking methods recover head motion by using a 3D head model. The model is projected to a 2D face image. The motion of head is achieved by minimizing the square difference between the face image and the input frame. The texture-based tracking methods are proposed by La Cascia [7] and Xiao [8], and these methods have shown promising performance by modeling a head as a texture based cylinder. Compared to other methods, tracking methods need to know a precise initial position of head.

3.The 3D Head Pose Estimation Method

The 3D head pose estimation method proposed in this paper uses texture-based tracking technique. Let P represent the motion parameter vector, T z y x z y x t t t ],,,,,[Z Z Z P . Where z y x t t t ,, represent the 3D translations relative to the three axes, and , , x y z Z Z Z represent the rotations.

We use AAM to detect frontal face and obtain an initial head pose 0P . When the k th frame is incoming, the relative movement k P between the k th frame and the (k-1)th frame can be recovered by tracking method. Then the head pose k P of the k th frame is | k n k P 1

0P .The Kalman filter algorithm contains two phases, the prediction phase and the correction phase. The prediction phase predicts the head pose in a new image. This prediction helps the tracking method to estimate the head pose in this new image. At last the correction phase updates the estimation and the error covariance for the next round of prediction phase. The flow chart of our method is shown in Fig.1.

Fig. 1. Flow chart of head pose estimation

Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427423

3.1.AAM-based Initialization

A difficulty of tracking methods is the requisite of an accurate initialization of head pose. Generally, the frontal view of head is adopted as the initial pose. As a fast and accurate algorithm, the Adaboost algorithm has been widely applied in frontal face detection. But when the head rotates a little angle, the Adaboost algorithm will still consider it as a frontal face. To avoid this problem, we use AAM to detect facial features and determine the frontal face.

Active Appearance Models (AAMs)[10] are generative face models, which contains a statistical model of the shape and grey-level appearance of face. The facial feature points located by AAM are shown in Fig.2. Once certain facial features, such as the eyes and mouth, are correctly found, a good determination of frontal face can be obtained by exploiting face symmetry properties. In a face region, eyes and mouth are assumed approximately coplanar, and the tip of the nose lies on the symmetry axis of face. The eyes and mouth are assumed to be symmetrically positioned on each side of the symmetry axis. Fig. 2. result of AAM

Here we select seven feature points, which are nose tip, two outer eye corners, two inner eye corners, and two mouth corners. The properties to determine frontal face are described as follows:

(1) The four eye corners lie on a same horizontal line;

(2) The vertical distance between left outer eye corner and nose-tip is approximately equal to that between right outer eye corner and nose-tip;

(3) The length of left eye is approximately equal to that of right eye;

(4) The ratio of two lengths, 1D and 2D , is larger than 1.8, where 1D is the length between two outer eye corners and 2D is the vertical length between nose-tip and the bottom lip.

3.2. Texture-based Tracking Method

Once a frontal face is detected, the head pose is assigned an initial value. Then tracking methods can estimate the head pose by recovering relative movement of head between the frames. We choose texture-based tracking methods because they can discover small shifts of head.

To track the 6 DOF of head motion, a 3D cylindrical model is used to approximately represent a human head. In texture-based techniques, the head tracking problem is solved in terms of image registration. After initialization, the head region in each frame is extracted to create a texture. The texture is mapped onto a cylindrical surface which consists of 40 triangles, and the textured head model is then rendered on an image depending on the 3D position and orientation of the cylinder. In fact, image registration is executed between the rendered image and a new frame.

To implement texture-based registration, an iterative image registration algorithm is performed. The algorithm first assumes that the position and rotation of head in the new frame are just the same as that in

424 Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427

the rendered image. Under this assumption, the mean error between the new frame and the rendered image may be very small. But if the error is larger than a threshold, the head pose changes. Then the algorithm searches for the parameters of motion. This process consists of multiple iterations. In every iteration, the algorithm uses spatial intensity gradient information and calculates motion parameter vector P by equation (1).

(1) Where k represents the kth iteration, t I and u I are the temporal and spatial image gradients. P F is the partial differential of F with respect to P . The intensity gradient

u I of the new frame is calculated by Sobel transform. Temporal gradient t I is obtained by the difference between the rendered image and the new frame. The motion transformation matrix is updated by multiplying P

.

The iteration converges if the mean error between the rendered image and the new frame is minimal. At this time, the transformation is considered as the head motion, and the detected head region is extracted as texture for the next frame. Otherwise, the iterative algorithm continues.

As shown in equation (1), all pixels in head region are involved in the computation. But not every pixel has the same confidence for the cylindrical model. We use a confidence map indicating the confidence of pixels. The confidence map is generated during the rendering process of cylindrical model. It is initially set to zero. If a pixel locates in the region of the projection of a triangle, its confidence is computed as equation (2).

(2) Where T is the angle between the normal of the triangle and the direction to the camera center.

When implementing the tracking method, we utilize an image pyramid. The image pyramid is constructed to represent an image at different resolutions. Parameters are first computed at the top layer of the pyramid, and then propagated to the lower layer.

3.2.The Kalman Filter A major problem of texture-based tracking methods is that it searches for the best transformation from the position and rotation of head in the last frame. When current pose differs largely from the previous one, this algorithm may spend much computation time to converge. However, if the parameters are precisely predicted, the tracking method can start searching from the pose near the actual values. In this way, the number of searching iterations can be decreased effectively.

For pose prediction, we choose the Kalman filter. The Kalman filter is computationally efficient, and it has the feasibility to model noise. It requires a process model of head motion and a measurement model.The process model governs the dynamic relationship between states of two successive time steps, which can be defined by equation (3).

||: :?1·¨?§ T u t u T u F I I F I F I P P P P 1T

2cos 255u confidence

Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427425

(3) k X is a state vector in the current frame, which contains six variables for the translation and rotation

of the head, six for velocities, and six for accelerations.

(4)

The matrix k A is the state transition matrix.

(5) 't represents the sampling interval between two consecutive measurements, and it can be calculated

from the frame rate of the actual video. represents the Kronecker product. k W is the system noise and assumed to be white noise. The components of k W have Gaussian distribution N (0, k Q ) with covariance matrix k Q :

(6)

Here q represents the variance of acceleration for head motion.

The measurement model relates the state vector k X to the measurement vector k P ,k P represents measurement vector which is the head pose estimated by the tracking method.

(7)

Where k H is the observation matrix, k V is the measurement noise matrix.

(8) After processing the k th frame, the Kalman filter provides an optimal estimation of the current state k X using the current input measurement k P . It produces the future state 1 k X using the underlying state model. As the process model has considered the previous head pose, velocities, and accelerations, 1 k X provides a good prediction for the head pose in the next frame.

4.Experiment Results

The experiments for the methods described above have been carried out under the laboratory environment. We have implemented our methods based on OpenGL library and OpenCV library. To evaluate the results of the image registration algorithm, we use the video sequences from Boston University [9]. We first choose a frame with frontal face, and then test the algorithm using other frames. ???1·¨¨¨?§''' u 100102/1266t t t I A k ??????1·¨¨¨¨¨¨?§''''''''' u t t t t t t t t t I q Q k 232343456621612131816181201????????1·¨¨¨¨¨¨¨¨?

§ 000000000000100000000000000000010000000000000000001000000000000000000100000000000000000010000000000000000001k H >@T z y x z y x z y x z y x z y x z y x k t t t t t t t t t X ,,,,,,,,,,,,,,,,,Z Z Z Z Z Z Z Z Z k k k W AX X 1k k k k V X H P

426Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427

Fig. 1. Results of the image registration algorithm

Fig.3 shows that the image registration algorithm performs quite well when the angle of rotation is not very large. By updating the texture for every frame, the texture-based tracking method can recover head motion effectively.

The numbers of iterations that the textured-based tracking algorithm spent for the frames in Fig.3 are recorded in table 1. This table also gives the numbers of iterations of the tracking algorithm when it uses the pose prediction provide by the Kalman filter. Table 1 demonstrates that using the Kalman filter can reduce the number of searching iterations and save computation time.

Table 1. The comparison of the number of iterations

b c d

Frame a

No pose prediction 23 27 41 24

31 16

16 17

prediction

Pose

5.Conclusion

In this paper, a tracking method combining the Kalman filter has been proposed for estimating head pose. The method uses texture-based tracking technique to estimate head pose and the Kalman filter to predict head motion. The experiments show that our method can decrease computation time of tracking method. In future, we plan to apply it in more complex environment.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (2010-IV-064).

References

[1] E Murphy-Chutorian, M. M. Trivedi, Head pose estimation in computer vision: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, pp. 607-626, IEEE Press, Washington (2009)

[2] M. Perse, J. Pers, Physics-Based Modelling of Human Motion using Kalman Filter and Collision Avoidance Algorithm. In: the 4th International Symposium on Image and Signal Processing and Analysis, pp. 328-333, IEEE Press, Zagreb (2005)

[3] C. Shan., W. Chan, Head Pose Estimation Using Spectral Regression Discriminant Analysis. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, IEEE Press, Miami (2009)

[4] P. Xiong, L. Huang,C. Liu, Initialization and pose alignment in active shape model. In: 2010 International Conference on Pattern Recognition, pp. 3971-3974, IEEE Press, Istanbul (2010)

[5] T.F. Cootes, G.J. Edwards, C.J. Taylor. Active Appearance Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, pp. 681-685, Washington (2001)

Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427427

[6] E. Murphy-Chutorian, M. M. Trivedi,:Head pose estimation and Augmented Reality Tracking:An Integrated System and

Evaluation for Monitoring Driver Awareness. IEEE Transactions on Intelligent Transportation System, Vol.11, pp, 300-311, IEEE

Press, Washington (2010)

[7] M. L. Cascia, S. Sclaroff, V. Athitsos, Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, pp. 322-336,

IEEE Press, Washington (2000)

[8] J. Xiao,, T. Moriyama, T. Kanade,, J. F. Cohn, Robust full-motion recovery of head by dynamic templates and re-registration techniques. In: 5th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 85 -94, IEEE Press, Washington (2003)

[9] https://www.sodocs.net/doc/d812044084.html,/headtracking/

相关主题