当前位置：搜档网 › CMOS果冻效应

CMOS果冻效应

Proceedings of 2010 IEEE 17th International Conference on Image Processing
September 26-29, 2010, Hong Kong
VIDEO STABILIZATION AND ROLLING SHUTTER DISTORTION REDUCTION Wei Hong1, Dennis Wei2, Aziz Umit Batur1
2
Systems and Applications R&D Center, Texas Instruments, Inc. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
ABSTRACT
translational motion intended by the user can be accurately estimated. But even if the camera is held still and the intentional translational motion is zero, camera shake still results in a type of rolling shutter distortion known as wobble distortion, a combination of oscillations in skew and vertical scaling. More specifically, horizontal shake causes sudden and alternating skews to the left and right as shown in Fig. 1(c). Vertical shake causes alternating stretching and shrinking in the vertical direction as shown in Fig. 1(d). No previous work has addressed wobble distortion.
1
This paper presents an algorithm that stabilizes video and reduces rolling shutter distortions using a six-parameter affine model that explicitly contains parameters for translation, rotation, scaling, and skew to describe transformations between frames. Rolling shutter distortions, including wobble, skew and vertical scaling distortions, together with both translational and rotational jitter are corrected by estimating the parameters of the model and performing compensating transformations based on those estimates. The results show the benefits of the proposed algorithm quantified by the Interframe Transformation Fidelity (ITF) metric. Index Terms— Rolling shutter distortion, video stabilization, affine transformation
1. INTRODUCTION
In recent years, an increasing number of video cameras have been equipped with CMOS sensors instead of CCD sensors to reduce cost. Most CMOS sensors use rolling shutters that expose scan lines of a frame sequentially, in either top-to-bottom or bottom-totop order. If the camera or objects in the scene move, distortions due to differences in exposure times can be observed. While rolling shutter distortions caused by local object motion can be very complicated, in general and especially for hand-held cameras, distortions due to global camera motion are dominant. Accordingly, in previous work [1-5] and in this paper, only rolling shutter distortions caused by global camera motion are considered. Global camera motion can be a combination of translational and rotational motion. It is often assumed that the global motion is constant between adjacent frames since the intervening time interval is normally small. For hand-held cameras, horizontal camera panning, vertical camera panning, and camera shaking are typical global translational motions. As analyzed in [1-5], horizontal panning results in skew distortion, i.e., the frames are bent to one side as shown in Fig. 1(a). We are assuming a top-tobottom scan order throughout this paper. Vertical panning results in vertical scaling distortion, i.e., the frames are stretched out or shrunk vertically as shown in Fig. 1(b). Rotational motion also causes non-linear rolling shutter distortion [1]. But since the distortion caused by rotational motion is relatively small compared to that caused by translational motion, we focus only on the latter. Although we do not address rolling shutter distortions due to rotation, we do compensate for rotational jitter as will be discussed. Previous work [1-5] has considered the compensation of skew and vertical scaling artifacts based on the assumption that the Fig. 1: Rolling shutter distortions introduced by global camera motion. The arrows indicate the direction of the global camera motion. (a) horizontal motion; (b) vertical motion; (c) horizontal jitter; (d) vertical jitter. To accurately estimate and correct for wobble distortion, we propose a novel six-parameter affine model to describe the transformation between two consecutive frames. Unlike the conventional affine model used in [2], our model contains parameters explicitly associated with skew and scaling transformations. Closed-form estimates of these parameters can be calculated from the motion vectors for a set of features, which are selected based on certain criteria to avoid ambiguities. Wobble distortions, which we regard as jitter in the skew and scaling parameters, can be corrected in the same way as we stabilize translational and rotational jitter, with little additional computation. Thus, our model provides a unified framework for the reduction of rolling shutter distortions together with translational and rotational jitter. We also present in this paper an adaptive IIR filter that is used to estimate the component of the transformations that is intentional. The strength of the IIR filter is controlled by the magnitudes of the intentional transformations in such a way that the estimates can follow the true intentional transformations more closely.
2. PROPOSED METHOD 2.1. Six-Parameter Affine Model of Frame-to-Frame Motion and Distortion
978-1-4244-7994-8/10/$26.00 ?2010 IEEE
3501
ICIP 2010

We begin by developing a model for the transformation between consecutive frames. When the camera is subject to horizontal motion, the skew between the current frame and the reference frame (previous frame) can be modeled as
conventional affine model [2] is that all six parameters have clear physical meanings so that each motion or distortion can be accurately estimated and manipulated without interfering with the others.
x ' = Ak x =
where x ' = [ x '
1 k x 0 1
(1)
2.2. Estimation Distortion
of
Frame-to-Frame
Motion
and
T
T y ' ] are the coordinates of a pixel in the current
To estimate the parameter vector p = [ d x d y θ sx we rewrite Eq. 5 as
sy k ] ,
frame and x = [ x y ]T are the coordinates in the reference frame. The origin of the coordinate system is in the center of the image. k is a parameter describing the degree of skew. When the camera is subject to vertical motion, vertical scaling distortion occurs. Zooming can also be modeled as scaling in both the horizontal and vertical directions. So the overall scaling transform is modeled as
dx dy 1 0 ?y 0 1 x
M
x 0
0 y
y 0
θ
sx sy k
p
=
x '? x y '? y
v
(6)
Sx x ' = As x = 0
0 x Sy
(2)
where S x and S y are scaling factors in the horizontal and vertical directions respectively. The translational and rotational motion between frames is modeled by a traditional three-parameter transformation:
A set of features at positions xi and their corresponding motion vectors vi are needed to solve Eq. 6. The following three-stage procedure is employed to select features and estimate their motion. In the first stage, the global translational motion is estimated first since its amplitude is typically much larger than the amplitudes of the rotational motion and other distortions. We use a method similar to the one described in [6]. The current frame is divided into a 3 x 3 grid of rectangular regions as shown in Fig. 2(a). Hierarchical motion estimation is performed for each region by using SAD (sum of absolute difference) based motion estimation. The motion vectors for the 9 regions are then grouped into clusters based on similarity. The cluster that has the smallest temporally low-pass filtered motion vector is selected. The average motion vector of this cluster is chosen as the translational motion estimate for the frame. For more details, please see [6].
x ' = Ar x + d =
cos θ sin θ
? sin θ cos θ
x+
dx dy
(3)
where θ is the angle of rotation and d = [ d x
T d y ] is the
translation vector. The combination of the above transformations between the current frame and the reference frame can be modeled as an affine transformation x ' = Ax + d . Since all of the transformations are relatively small, the order of the matrices Ar , As , Ak does not matter. Also, because the rotation angle θ is very small, sin θ can be approximated as θ and cos θ can be approximated as 1. S x
and S y have values close to 1 and can be expressed as 1 + s x and
1 + s y where sx and s y are small. The skew factor k is also
small. Based on these approximations, the transformation matrix A can be simplified to
A = Ar As Ak =
0 1 + sx 1 ? θ 1 + sx 1 k ≈ 0 1 + sy 0 1 θ θ 1
?θ + k 1+ sy
(4) In summary, the transformation between the current frame and the reference frame is described by the following six-parameter affine model:
(a) (b) Fig. 2: (a) The current frame is divided into 9 rectangular regions. Translation vectors are estimated for all regions and grouped into clusters. (b) Each selected region is subdivided into 25 features. Translation vectors are estimated for selected features.
The second stage selects a small set of features (shown as dark blocks in Fig. 2(b)) from the cluster that was selected in the first stage in order to do more accurate and computationally intensive motion estimation. Each region is subdivided into a 5 x 5 array of smaller rectangular blocks (features). We wish to choose features that are dissimilar to their surroundings in order to obtain more accurate motion vectors. Toward this end, horizontal and vertical projection vectors are computed for each feature and for the
x ' = Ax + d =
where d x , d y
1 + sx
?θ + k 1 + sy
θ
x+
dx dy
(5)
,θ
are motion parameters and
sx , s y , k
are
distortion parameters. The advantage of this model over the
3502

surrounding region in the current frame. SADs between the projection vectors corresponding to the feature and to its surrounding region are computed for a range of displacements, resulting in two SAD profiles (horizontal and vertical) that characterize how similar a feature is to its neighborhood. Each feature is assigned a score based on three criteria: the depths of the primary troughs centered on zero displacement in its SAD profiles (by definition, the SAD at zero displacement is zero), the depths of any secondary troughs, which indicate nearby features that may be confused with the feature being considered, and the distance to the center of the frame, given that features farther from the center allow for more reliable estimation of rotation, scaling, and skew. The three best features according to these criteria are selected from each region in the cluster. In the third stage, the translational motions of all selected features are estimated using a conventional block-matching method. Each feature in the current frame is matched against features in the reference frame located within a range of displacements centered on the global translational motion vector obtained in stage 1. The displacement resulting in the smallest SAD is taken to be the motion vector vi of the feature at x'i in the current frame. The position of each matched feature in the reference frame is therefore
In our approach, we regard the low-pass component of the transformations as intentional and the high-pass component as jitter. A simple way to reduce jitter is to apply a first-order IIR low-pass filter. The low-pass frame-to-frame transformation vector p L is computed as
p L [n ] = α ? p L [n ? 1] + (1 ? α ) ? p[n ]
(8)
where α is a vector consisting of damping factors between 0 and 1 for each component in p , 1 is a vector of ones, and the multiplications are done component-wise. It is difficult to find a set of damping factors that are good for all cases. If the damping factors are too small, the jitter cannot be sufficiently reduced. On the other hand, if there are significant intentional transformations and the damping factors are too close to 1, there will be a long delay between the estimated and the true values. To overcome this problem, the damping factors are adapted to the magnitudes of the intentional transformation parameters. The adaptive damping vector β [n] is computed as
xi = x ' i ? vi .
At least 3 pairs of xi and vi corresponding to 3 features are required to solve Eq. (6). If there are more than 3 features, Eq. (6) is over-determined. Since the matrix M is sparse, a least-squares estimate of the transformation parameter vector p can be determined efficiently in closed form.
β [n ] = max( β0 ? ρ ? | p L [n ] |,0)
(9)
where β 0 is a vector representing the maximum damping factor values and ρ is a vector of user-specified parameters that control the strength of the adaptation. The idea behind the adaptation is to reduce the damping factors when the intentional transformations are large so that the estimates can follow the true transformations more quickly. The larger the values in the strength vector ρ , the more sensitive the adaptation. Estimates of the intentional transformation vector p I are given by
2.3. Compensation for Wobble Distortion and Motion Jitter
Compensation for unwanted motion and distortion is done by carrying out an affine transformation similar to that in Eq. (5) on each frame of the input video sequence. The value of a pixel located at ( x, y ) in the output frame is taken from a location
p I [n] = β [n]? p I [n ?1] + (1 ? β [n]) ? p[n]
(10)
( x ' , y ' ) in the current frame specified by
c x′ 1 + sx = c y′ θ
with β [n] determined according to Eq. (9). The compensation parameter vector p w for wobble distortion and motion jitter is then
?θ c + kc 1 + sc y
x dc + x c y dy
(7)
p w [n] = p w [n ? 1] + p[n] ? p I [n]
(11)
The output frame is cropped to avoid having invalid regions. To produce an output frame with the same size as the input frame, a mosaicing technique (e.g. [7]) can be applied to fill in any invalid regions using information from previous frames. In this section and in Section 2.4, we describe how the c c c compensation parameter vector p c = [d x d y θ c s x s c k c ]T y is determined from the estimated transformation vector p . We focus in this section on video stabilization and reduction of wobble distortions. Stabilization involves removing jitter from the translational and rotational motion while preserving motion intended by the user such as camera panning. Removing wobble distortions involves removing jitter from the skew and scaling parameters. Both tasks can therefore be accomplished by removing jitter from all of the transformation parameters.
2.4. Compensation for Skew and Vertical Scaling Distortion
In this section, we discuss compensation for the skew and vertical scaling distortions caused by intentional camera motion (panning). The skew distortion is proportional to the horizontal intentional motion. Hence the addition to the skew compensation I parameter can be calculated as λ x d x , where λx is a parameter describing the strength of the skew compensation. Similarly, the vertical scaling distortion is proportional to the vertical intentional motion, and the addition to the vertical scaling compensation can I be expressed as λ y d y where λ y is another strength parameter. λx and λ y can be determined empirically or derived from the
3503

scanning speed of the rolling shutter. Thus the compensation vector for skew and vertical scaling distortions is
I p d = [0 0 0 0 λ y d y
4. CONCLUSION
In this paper, we introduced a unified framework for removing rolling shutter distortions including wobble, skew and vertical scaling, as well as translational and rotational jitter. An adaptive IIR filter is applied to the six parameters estimated from frame-toframe transformations to effectively remove distortion and jitter. Experimental results demonstrate that this algorithm achieves higher ITF values compared to translational and rotational video stabilization.
λx d xI ]
(12)
The overall compensation vector p c is given by
p c [ n] = p w [ n] + p d [ n]
(13)
3. EXPERIMENT RESULTS 5. REFERENCES
The performance of our method is evaluated on two representative test videos. The resolution is QVGA (320x240) and the frame rate is 15fps. Video1 (128 frames) is taken while the cameraman is walking forward. Video2 (101 frames) is taken while the cameraman is walking forward and panning the camera to the left and right. The Interframe Transformation Fidelity (ITF) metric [8], which measures the average frame difference between every pair of consecutive frames, is used to objectively evaluate the quality of our algorithm. We disable certain parts of our algorithm to create two other versions: (1) translational-only video stabilization (TVS) (2) translational plus rotational video stabilization (RVS). Tab. 1 summarizes the ITF values for the original video (cropped to the same size as the other results) and the output videos after TVS, RVS, and the proposed rolling shutter distortion reduction. ITF(dB) video1 video2 Original 15.43 13.84 TVS 18.26 14.37 RVS 18.60 14.53 Proposed 19.33 14.64 [1] C. Geyer, M. Meingast, and S. Sastry, “Geometric models of rolling-shutter cameras,” IEEE Workshop on Omnidirectional Vision, 2005. [2] W.-H. Cho, K.-S. Hong, “Affine motion based CMOS distortion analysis and CMOS digital image stabilization,” IEEE Transactions on Consumer Electronics, vol. 53, issue 3, pp. 833-841, Aug. 2007. [3] C.-K. Liang, L.-W. Chang, H. H. Chen, "Analysis and compensation of rolling shutter effect", IEEE Transactions on Image Processing, pp. 1323-1330, Aug. 2008. [4] J.-B. Chun, H. Jung, and C.-M. Kyung, “Suppressing rollingshutter distortion of CMOS image sensors by motion vector detection,” IEEE Transactions on Consumer Electronics, pp. 1479-1487, Nov. 2008. [5] D. Bradley, B. Atcheson, I. Ihrke, and W. Heidrich, “Synchronization and rolling shutter compensation for consumer video camera arrays,” International Workshop on Projector-Camera Systems, 2009. [6] A. U. Batur and B. Flinchbaugh, “Video stabilization with optimized motion estimation resolution,” IEEE International Conference on Image Processing, pp. 465-468, Oct. 2006. [7] A. Litvin, J. Konrad, and W.C. Karl, “Probabilistic video stabilization using Kalman filtering and mosaicking,” in Proc. of IS&T/SPIE Symposium on Electronic Imaging, Image and Video Communications., 2003, pp. 663–674. [8] L. Marcenaro, G. Vernazza, C. S. Regazzoni, “Image stabilization algorithms for video-surveillance applications,” IEEE International Conference on Image Processing, pp. 349 – 352, 2001.
Tab. 1: ITF values for the original video and output videos.
Fig.3 shows a visual comparison between the results of video stabilization (translational plus rotational) and the proposed rolling shutter distortion reduction on a few frames of the test videos. In the top row, even after translational and rotational jitters are removed, there still remain obvious wobble distortions in the frames in (a) and a large skew distortion in the frame in (b). In the bottom row, the proposed algorithm removes the wobble and skew distortions.
(a) (b) Fig. 3: (a) Top row: frame 39 to frame 44 of video1 after translational plus rotational video stabilization. Bottom row: the same frames after rolling shutter distortion reduction. The wobble distortion introduced by camera shake is eliminated. (b) Top: frame 53 of video2 after translational and rotational video stabilization. Bottom: the same frame after rolling shutter distortion reduction. The skew distortion introduced by camera panning is eliminated.
3504

CMOS果冻效应

相关文档

最新文档