搜档网
当前位置:搜档网 › Combining Features for Shape and Motion Trajectory of Video Objects for Efficient Content Based Vide

Combining Features for Shape and Motion Trajectory of Video Objects for Efficient Content Based Vide

Combining Features for Shape and Motion Trajectory of Video Objects for Efficient Content Based Vide
Combining Features for Shape and Motion Trajectory of Video Objects for Efficient Content Based Vide

Combining features for Shape and Motion Trajectory of Video Objects for ef?cient

Content based Video Retrieval

A.Dyana,M.P.Subramanian and Sukhendu Das

Visualization and Perception Lab

Department of Computer Science and Engineering

Indian Institute of Technology Madras

Chennai,India

Email:dyana@cse.iitm.ernet.in,mp.subramanian@https://www.sodocs.net/doc/a512693707.html,,sdas@iitm.ac.in

Abstract

This paper proposes a system for content based video retrieval based on shape and motion features of the video object.We have used Curvature scale space for shape representation and Polynomial curve?tting for trajectory representation and retrieval.The shape representation is invariant to translation,rotation and scaling and robust with respect to noise.Trajectory matching incorporates visual distance,velocity dissimilarity and size dissimilarity for retrieval.The cost of matching two video objects is based on shape and motion features,to retrieve similar video shots. We have tested our system on standard synthetic databases. We have also tested our system on real world databases. Experimental results have shown good performance.

1.Introduction

Due to a large amount of video data now publicly available,accessing these data without appropriate search technique is dif?cult.Very few video search engines have been developed till now.Motion,shape,color and texture were used as features for video retrieval in VideoQ[1].A region-based content description scheme(NeTra-V)using low-level visual descriptors was proposed in[2].

Shape and motion are important features in content based video retrieval.Two leading approaches emerged:one by Latecki et al.[3]based on the best possible correspon-dence of visual parts and a second approach developed by Mokhtarian et al.based on the curvature scale space (CSS)representation[4].One of the earliest approaches to trajectory representation for video retrieval was outlined by Dimitrova and Golshani in[5].Four spatiotemporal trajectory representations:raw points;B-spline parameters;

a chain code,and a differential chain code were used in[5]. Polynomial representation is standardized in MPEG-7[6].A hybrid method which includes a sketch-based scheme and a string-based one to analyze and index a trajectory with more syntactic meanings was proposed by J.W.Hsieh[7].

In our system Curvature Scale Space has been used for representing shapes and Polynomial Curve?tting for representing motion trajectory.Curvature Scale Space Image standardized in MPEG-7emulates well the human percep-tion of shapes.Polynomial Curve?tting standardized in MPEG-7is ef?cient in representing motion trajectory.Our system integrates both these features to retrieve similar video objects.

2.Proposed Overall System

Our system integrates two important features:shape and motion of the video object to retrieve video shots containing video objects with similar shape and motion properties. Figure1shows the overall?owchart.The preprocessing step needed to extract features from the video shot is explained below.In our system,background subtraction has been used to segment foreground moving objects from the video. Gaussian mixture model is used to model the background as described in[8].The code available on[9]is used for our implementation of video object segmentation.After extracting foreground objects any representative frame is chosen,from which the shape of video object is extracted and represented.Centroid of the foreground objects are used to extract the motion trajectory of the moving video object. Figure2shows an example of a frame from a video shot and its corresponding extracted shape contour.

After feature extraction,the shape and motion trajectory of the video object are represented using two different tech-niques standardized in MPEG-7:Curvature Scale Space for shapes and Polynomial Curve?tting for motion trajectory. The shape and motion features are stored in the video database.For a query video,shape and motion features are extracted from the query.The represented query shape is matched with the shape feature of a model from the database using CSS based matching.The motion trajectory of query is also matched with the model using trajectory matching described in Section4.The matchcosts from the shape and trajectory matching are combined to give the?nal matchcost. The features of the object in the query are matched

against

2009 Seventh International Conference on Advances in Pattern Recognition 978-0-7695-3520-3/09 $25.00 ? 2009 IEEE

DOI 10.1109/ICAPR.2009.37

113

Figure 1.Overall system for Content-based video re-

trieval.

(a)(b)

Figure 2.(a)Sample frame from a video shot (b)Extracted shape contour from frame (a).

the features of the stored objects in the database.The video shots corresponding to the video objects are rank ordered and retrieved according to their computed mathcosts.

3.Shape Representation and Matching

We have used CSS shape representation standardized in MPEG-7.A CSS image is a multi-scale representation of the in?ection points (or curvature zero-crossing points)of the contour as it evolves.Contour evolution is achieved by ?rst parameterizing using arc length (u ).This involves sampling the contour at equal intervals and recording the 2-D coordinates of each sampled point.The result is a set of two coordinate functions (of arclength)which are then convolved with a Gaussian ?lter of increasing width or standard deviation.Evolution process results in ?ne to coarse description of the shape of planar contour.Next step is to compute curvature on each smoothed contour.As a result,curvature zero-crossing points can be recovered and mapped to the CSS image in which the horizontal axis represents the arc length parameter (u )on the original contour,and the vertical axis represents the standard deviation (σ)of the Gaussian ?lter.

Figures 3(b)and (c)show the evolved contours for the shape ’bird’in Fig.3a and the zero crossings marked

on

(a)

(b)

(c)

(d)

Figure 3.(a)Input object ”bird”from MPEG7-B [10].(b,c)Evolved contours (with ZCs marked)when convolved with Gaussians of σ=5,15respectively.(d)CSS image for the object in Fig.3a.

it.The contour ?nally evolves into a convex shape with no zero crossings.As in Figs.3(b)and (c),the zero crossings move towards each other for larger smoothing.Finally they merge at a point forming the maxima in the CSS image.Figure 3d shows the CSS image for the shape in Fig.3a.Every arc in the CSS image corresponds to a pair of zero crossings in the evolved contour which separates the convex and concave sections.The maxima from the CSS image are used as feature to represent shape.CSS matching is performed between two sets of maxima as described in [7].

4.Trajectory Representation

The motion trajectory of a video object is extracted from the video shot by ?nding the centroid of the moving object.Polynomial curve ?tting is used for trajectory representation as standardized in MPEG-7[6].Trajectory is divided into sub trajectories based on the key points detected as described in [7].A Polynomial curve is ?tted for every subtrajectory.The sequence of steps involved for trajectory representation and matching are brie?y explained below.

A sampling method is ?rst used to detect all key points with high curvatures from the trajectory for capturing its geometrical features.High curvature points are points on the trajectory whose angle (α)is less than a certain threshold (T α).Angle (α)at a key point is computed by selecting two speci?ed points,which lie on either side within a certain range of distance (d min and d max )from the keypoint.The high curvature points are further eliminated for better curve ?tting such that two key points lie far from each other (>d min ).The direction change at the key points are detected to divide the trajectory into sub trajectories.Clockwise or anticlockwise direction change is determined using the two adjacent key points which lie on both sides of the key point.As we start from the initial key point of the trajectory,whenever there is a clockwise direction change followed by an anticlockwise direction change or vice versa,the trajectory is divided into sub trajectories at that key point.This segmentation is done for accurate representation and easier partial trajectory matching.Figure 4shows an

example trajectory.High curvature points are marked on the input trajectory in Fig.4a.The segmented sub trajectories with their key points are shown in Fig.

4b.

(a)(b)(c)(d)

Figure 4.(a)Input trajectory with key (high curvature)points marked (b)T rajectory segmented into two sub trajectories (c)Key points regenerated (d)Polynomial curve ?tted on the input trajectory.

These key points are used as control points for curve ?tting.To improve the accuracy of curve ?tting,if distance between any two control points are greater than some thresh-old (>T d )in a sub trajectory,their middle point is selected as a new control point.A polynomial of second order is ?tted through the control points for every subtrajectory as shown in dotted lines in Fig.4d.The coef?cients of the polynomials along with their temporal occurances are stored for every subtrajectory.

Trajectory matching between the query (q )and model (m )in the database is done based on their position,velocity and size differences.For similarity comparison,the sampled data points from ?tted polynomial curve are normalized to maintain scale and shift invariance.Position difference (C p )is calculated from the Euclidean distance between the positions of query and model trajectories as:

C p =

1N N ?1

k =0

(x q (k )?x m (k ))2+(y q (k )?y m (k ))2 (1)

Similarly,velocity difference (C v )is given by,

C v =

1N N ?1

k =0

(v x q (k )?v x m (k ))2+(v y q (k )?v y

m (k ))2 (2)

where v x and v y

are the x and y components of the velocity vector and N is the number of sampled points.Size difference (C sz )is given by,

C sz =

2|size (q )?size (m )|size (q )+size (m )

(3)

where size (q )=max(max(x q )?min(x q ),max(y q )?min(y q )).The three distances are integrated to give the dissimilarity between the trajectories as follows:

C t =wt 1?C p +wt 2?C v +wt 3?C sz

(4)

where the three weights are assigned for position,velocity and size dissimilarities respectively.

5.Experimental Results for Video Retrieval

Integration of shape and motion features are done to retrieve video shots with similar shape and motion.The matchcosts obtained from CSS matching for shape (C s )and trajectory matching (C t )are normalized and summed up with appropriate weights as follows:

C =w s ?C s +w t ?C t

(5)

w s and w t are weights assigned for shape and trajectory respectively.

The synthetic database used for our experiments consists of 1000shapes from MPEG7-B [10]and 1000trajectories available from [11].The shape and trajectory pair database consists of 100classes with 20shape-trajectory pairs per class.The threshold and range of values used for the experiments are shown in Table 1.The weights assigned during trajectory matching for position,velocity and size in Eqn.4are 0.9,0.9and 0.2respectively.Equal weightage is given to both shape and trajectory (Eqn.5).One can determine the weights to provide more bias towards shape or trajectory for retrieval.

T able 1.Threshold and range of values used.|T |and |T s |are the lengths of trajectory and sub trajectory

respectively.

Symbols T αT d d min d max Values

150?

|T s |/3

|T |/20

|T |/15

We have evaluated the performance of our system using Precision vs.Recall ratio.We have compared our system with (i)only shape (CSS)[4]and (ii)only motion (Polyno-mial Curve ?tting)[7][6]used as a feature for video retrieval

for the same database.The precision-recall graph averaged over 1000queries for the three methods is shown in Fig.5.The precision-recall graph of our method outperforms the other two methods based on only shape [4]or trajectory [7][6].

Figure 5.Performance with Precision-recall metric av-eraged over 1000queries,using simulated dataset.We have also tested our system on real world databases [12]consisting of 250real video shots.Figure 6shows two

examples for retrieved video shots for two different queries from real world databases.The?rst eight retrieved video shots are displayed in Figs.6(a)and(b).Retrieved video shots are shown by displaying the representative(median) frame from the video shots with their corresponding motion trajectories shown in Figs.6(c)and(d).It is observed that most similar video shots are retrieved.The system performs well for real world databases.Retrieved results will depend on the database chosen.

6.Conclusion

We have proposed a system for content based video retrieval based on shape and motion.Curvature scale space has been used for shape representation and Polynomial Curve?tting for trajectory representation.The two features are integrated by combining their matchcosts.We have tested our system on synthetic and real world databases. Experimental results have shown good performance.Our future work involves including features such as color and texture to retrieve similar video objects more robustly. References

[1]S.F.Chang and et.al,“A fully automated content-based video

search engine supporting spatiotemporal queries,”IEEE Transactions on Circuits and Systems for Video Technology, vol.8,no.5,pp.602–615,September1998.

[2]Y.Deng and B.S.Manjunath,“Netra-v:Toward an object-

based video representation,”IEEE Transactions on Circuits and Systems for Video Technology,vol.8,no.5,pp.616–627,September1998.

[3]https://www.sodocs.net/doc/a512693707.html,tecki,https://www.sodocs.net/doc/a512693707.html,kamper,and U.Eckhardt,“Shape descrip-

tors for non-rigid shapes with a single closed contour,”in IEEE Conference on Computer Vision and Pattern Recogni-tion,2000,pp.424–429.

[4] F.Mokhtarian and M.Bober,Curvature Scale Space Repre-

sentation:Theory,Applications and MPEG-7Standardization.

The Netherlands:Kluwer Academic Publishers,2003. [5]N.Dimitrova and F.Golshani,“Motion recovery for video

content classi?cation,”ACM Transactions on Information Systems,vol.14,no.13,pp.408–439,1995.

[6]S.Jeanin and A.Divakaran,“MPEG-7Visual Motion De-

scriptors,”IEEE Transactions on Circuits and Systems for Video Technology,vol.11,no.6,pp.720–724,June2001.

[7]J.W.Hsieh,S.L.Yu,and Y.S.Chen,“Motion-based

video retrieval by trajectory matching,”IEEE Transactions on Circuits and Systems for Video Technology,vol.16,no.3, pp.396–409,March2006.

[8]P.KaewTraKulPong and R.Bowden,“An Improved Adap-

tive Background Mixture model for real-time tracking with shadow detection,”in2nd European Workshop on Advanced Video Based Surveillance Systems,AVBS01,Kingston upon Thames,Sept2001.

[9]OpenCV,“https://www.sodocs.net/doc/a512693707.html,/projects/opencvlibrary/.”

[10]MPEG7-B Database,“

https://www.sodocs.net/doc/a512693707.html,/asl/data/experiments/cikm2005/database2/.”

[11]Trajectory Database,“

https://www.sodocs.net/doc/a512693707.html,.tw/trajectory/trajectory.rar.”[12]VPlab videos,“

http://www.cs.iitm.ernet.in/~

sdas/vplab/downloads.html.”

(a)(b)

(c)(d)

Figure6.(a),(b)T wo examples of retrieved video shots from a real world video database,for two query video shots. Only a representative(median)frame from all the video shots are shown,which are presented with decreasing order of similarity(marked by index).(c)and(d)Motion T rajectories are shown for all the corresponding video shots in(a) and(b)respectively.

相关主题