搜档网
当前位置:搜档网 › Generalized B pictures and the draft H.264AVC video-compression standard

Generalized B pictures and the draft H.264AVC video-compression standard

Generalized B pictures and the draft H.264AVC video-compression standard
Generalized B pictures and the draft H.264AVC video-compression standard

Generalized B Pictures and the Draft H.264/A VC Video-Compression Standard

Markus Flierl,Student Member,IEEE and Bernd Girod,Fellow,IEEE

Abstract—This paper reviews recent advances in using B pictures in the context of the draft H.264/A VC video-compression standard.We focus on reference picture selection and linearly combined motion-compensated prediction signals.We show that bidirectional prediction exploits partially the efficiency of combined prediction signals whereas multihypothesis prediction allows a more general form of B pictures.The general concept of linearly combined prediction signals chosen from an arbitrary set of reference pictures improves the H.264/A VC test model TML-9 which is used in the following.We outline H.264/A VC macroblock prediction modes for B pictures,classify them into four groups and compare their efficiency in terms of rate-distortion performance. When investigating multihypothesis prediction,we show that bidirectional prediction is a special case of this concept.Multihy-pothesis prediction allows also two combined forward prediction signals.Experimental results show that this case is also advanta-geous in terms of compression efficiency.The draft H.264/A VC video-compression standard offers improved entropy coding by context-based adaptive binary arithmetic coding.Simulations show that the gains by multihypothesis prediction and arithmetic coding are additive.B pictures establish an enhancement layer and are predicted from reference pictures that are provided by the base layer.The quality of the base layer influences the rate-distortion trade-off for B pictures.We demonstrate how the quality of the B pictures should be reduced to improve the overall rate-distortion performance of the scalable representation.

Index Terms—B pictures,motion-compensated prediction,mul-tiframe prediction,multihypothesis motion-compensated predic-tion,temporal scalability,video coding.

I.I NTRODUCTION

B PICTURES are pictures in a motion video sequence that

are encoded using both past and future pictures as refer-ences.The prediction is obtained by a linear combination of forward and backward prediction signals usually obtained with motion compensation.However,such a superposition is not nec-essarily limited to forward and backward prediction signals[1], [2].For example,a linear combination of two forward-predic-tion signals can also be efficient in terms of compression ef-ficiency.The prediction method which linearly combines mo-tion-compensated signals regardless of the reference picture se-lection will be referred to as multihypothesis motion-compen-sated prediction[3].The concept of reference picture selection [4],also called multiple-reference picture prediction,is utilized

Manuscript received Janaury24,2002;revised May9,2003.

M.Flierl was with the Information Systems Laboratory,Stanford Uni-versity,Stanford,CA94305USA,on leave from the Telecommunications Laboratory,University of Erlangen-Nuremberg,Erlangen,Germany(e-mail: mflierl@https://www.sodocs.net/doc/9217854453.html,).

B.Girod is with the Information Systems Laboratory,Stanford University, Stanford,CA94305USA(e-mail:bgirod@https://www.sodocs.net/doc/9217854453.html,).

Digital Object Identifier10.1109/TCSVT.2003.814963to allow prediction from both temporal directions.In this partic-ular case,a bidirectional picture reference parameter addresses both past and future reference pictures[5].This generalization in terms of picture reference selection and linearly combined prediction signals is reflected in the term generalized B pictures and is realized in the emerging H.264/A VC video-compression standard[6].It is desirable that an arbitrary pair of reference pic-tures can be signaled to the decoder[7],[8].This includes the classical combination of forward and backward prediction sig-nals but also allows forward/forward as well as backward/back-ward pairs.When combining the two most recent pictures,a functionality similar to the dual-prime mode in MPEG-2[9], [10]is achieved,where the top and bottom fields are averaged to form the final prediction.

The efficiency of forward and backward prediction has already been raised in1985by Musmann et al.[11].In the same year, Ericsson[12]published investigations on adaptive predictors for hybrid coding that use up to four previous fields.A rate-distortion efficient technique for block-based reference picture selection was introduced by Wiegand et al.[4].The now known concept of B pictures was proposed to MPEG by Puri et al.[13].The motiva-tion was to interpolate any skipped frame taking into account the movement between the two“end”frames.The technique,called conditional motion-compensated interpolation,coupled the mo-tion-compensated interpolation strategy with transmission of the significant interpolation errors.

A theoretical analysis of multihypothesis motion-com-pensated prediction in[3]discusses performance bounds for hybrid video coding:In the noiseless case,increasing the accuracy of motion compensation from,e.g.,half-pel to quarter-pel reduces the bit rate of the residual encoder by at most1bit/sample.In the case of uncorrelated displacement errors,doubling the number of linearly combined prediction signals gains at most0.5bits/sample.The overall performance of motion-compensated prediction is limited by the residual noise which is also lowered by linearly combined prediction signals.[14]investigates optimal multihypothesis motion estimation.It is demonstrated that joint estimation of several motion-compensated signals implies maximally negatively correlated displacement errors.In the noiseless case,increasing the accuracy of multihypothesis motion-compensated predic-tion from,e.g.,half-pel to quarter-pel reduces the bit rate of the residual encoder by at most2bits/sample.This improvement is already observed for two hypotheses and also applies to predictors of higher order.With respect to multiple reference picture prediction,doubling the number of reference pictures for motion-compensated prediction reduces the bit rate of the residual encoder by at most0.5bits/sample.Whereas

1051-8215/03$17.00?2003IEEE

doubling the number of reference pictures for multihypothesis

motion-compensated prediction reduces the bit rate of the

residual encoder by at most1bit/sample[15].

B pictures in H.264/A V

C have been improved in several ways

compared to B pictures in MPEG-2[9]and H.263[16].The

block size for motion compensation can range from16

4.In addition,the use of the reference picture

set available for predicting the current B picture is suited to its

temporally noncausal character.

In contrast to the previously mentioned inter-mode mac-

roblocks which signal motion vector data according to its block

size as side information,the direct-mode macroblock does

not require such side information but derives reference frame,

block size,and motion vector data from the subsequent inter

picture.This mode superimposes two prediction signals.One

prediction signal is derived from the subsequent inter picture,

the other from a previous picture.

A linear combination of two motion-compensated prediction

signals with explicit side information is accomplished by the

multihypothesis-mode macroblock.Existing standards with B

pictures utilize the bidirectional mode,which only allows the

combination of a previous and subsequent prediction signal.The

multihypothesis mode generalizes this concept and supports not

only the already mentioned forward/backward prediction pair,

but also forward/forward and backward/backward pairs.

B.Direct Mode

The direct mode uses bidirectional prediction and allows

residual coding of the prediction error.The forward and

backward motion vectors(

FLIERL AND GIROD:GENERALIZED B PICTURES AND THE DRAFT H.264/A VC VIDEO-COMPRESSION STANDARD589

Fig.2.A direct-mode block has two derived motion vectors MV

pointing to two reference pictures RL.

current B picture and the previous inter picture.When multiple-

reference picture prediction is in use,the current draft[6]uses

modified definitions for the temporal distances.In that case,the

actual reference picture

for direct-mode

coded macroblocks according to

590IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL.13,NO.7,JULY2003

Fig.4.Relative occurrence of the macroblock modes in B pictures versus

quantization parameter for the QCIF sequence Mobile&Calendar with30

fps.Two B pictures are inserted after each inter picture.Five past and three

subsequent reference pictures are used.The relative frequency of the B picture

macroblock modes direct,inter,and MH are compared.

With the direct mode for the B pictures,the rate-distortion per-

formance at high bit rates is dominated by the efficiency of the

residual encoding.The inter modes improve the compression ef-

ficiency approximately by1dB in PSNR at moderate and high

bit rates.At very low bit rates,the rate-penalty in effect dis-

ables the modes in the inter group due to extra side informa-

tion.Similar behavior can be observed for the multihypothesis

(MH)mode.Transmitting two prediction signals increases the

side information additionally.Consequently,the multihypoth-

esis mode improves compression efficiency approximately by1

dB in PSNR at high bit rates.

Corresponding to the rate-distortion performance of the three

groups,Fig.4depicts the relative occurrence of the macroblock

modes in B pictures versus quantization parameter

FLIERL AND GIROD:GENERALIZED B PICTURES AND THE DRAFT H.264/A VC VIDEO-COMPRESSION STANDARD

591

Fig.7.The multihypothesis mode also allows a linear combination of two past macroblock prediction signals.The inter pictures are denoted by

P.

Fig.8.PSNR of the B picture luminance signal versus B picture bit rate for the CIF sequence Mobile &Calendar with 30fps.Two B pictures are inserted after each inter picture.

QP

=

QP

592IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY ,VOL.13,NO.7,JULY

2003

Fig.11.Average bit rate at 35-dB PSNR versus number of reference pictures for the CIF sequence Mobile &Calendar with 30fps.Generalized B pictures with forward-only prediction are compared to inter pictures.

assumes that all subblocks can be found on that specified refer-ence picture.But the current draft H.264/A VC is similar to the H.263standard,where multiple-reference picture prediction uti-lizes picture reference parameters for

16

35dB of the

luminance signal over the number of reference

pictures

reference pictures reduces

the bit rate from 2019to 1750kbit/s when coding the sequence Mobile &Calendar .This corresponds to a 13%bit-rate savings.The gain by the generalized B pictures with forward-only pre-diction and just one reference picture is limited to 6%.The gain by the generalized B pictures over the inter pictures improves for a increasing number of reference pictures [22].This obser-vation is independent of the implemented multihypothesis pre-diction scheme [15].

Fig.12depicts the average luminance PSNR from recon-structed pictures over the overall bit rate produced by TML-9inter pictures (IPPP…)and the generalized B pictures with

for-

Fig.12.PSNR of the luminance signal versus overall bit rate for the CIF sequence Mobile &Calendar with 30fps.Generalized B pictures with forward-only prediction are compared to inter pictures.

ward prediction only (IBBB…)for the sequences Mobile &Cal-endar .The number of reference pictures is chosen to

be

is

FLIERL AND GIROD:GENERALIZED B PICTURES AND THE DRAFT H.264/A VC VIDEO-COMPRESSION STANDARD593

Fig.13.PSNR of the B picture luminance signal versus B picture bit rate for

the CIF sequence Mobile&Calendar with30fps.Two B pictures are inserted

after each inter picture.Five past and three future inter pictures are used for

predicting each B picture.QP=4f(QP=QP=4f(QP

Mode and

(8)

Detailed discussions of this relationship can be found in[28]

and[29].Experimental results in Section IV-D verify that this

relation should be adapted for B pictures as specified in the test

model TML-9

594IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL.13,NO.7,JULY2003 with the motion vector

,and the Lagrange multiplier

for the distortion measure.The rate term

for the distortion measure is

related to the Lagrangian multiplier for the measure(8)by

FLIERL AND GIROD:GENERALIZED B PICTURES AND THE DRAFT H.264/A VC VIDEO-COMPRESSION STANDARD595

Fig.16.Relative occurrence of the multihypothesis mode in B pictures versus quantization parameter for the CIF sequence Mobile&Calendar with30fps. Five past and three future reference pictures are used.QP 0QP

on the overall compression efficiency is investigated for .

that is,lowering their relative PSNR,improves the overall com-pression efficiency of the sequence.

Fig.19shows the PSNR of the luminance signal for indi-vidual pictures of the sequence Mobile&Calendar encoded with

596IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL.13,NO.7,JULY2003

Fig.19.PSNR of the luminance signal for individual pictures.Two B pictures

are inserted.The B picture quantization parameter QP

=4f(QP=14.

Fig.20.PSNR of the luminance signal versus overall bit rate for the QCIF

sequence Foreman with30fps.When replacing two inter pictures by B pictures,

the values QP

FLIERL AND GIROD:GENERALIZED B PICTURES AND THE DRAFT H.264/A VC VIDEO-COMPRESSION STANDARD 597

[17]H.26L Test Model Long Term Number 9,TML-9(2001).[Online].

Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/h26L/tml9.doc [18]T.Yang,K.Liang,C.Huang,and K.Huber.(2000)Temporal Scalability

in H.26L.ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0005_Osa/ql5j45.doc

[19]K.Lillevold.(1999)B Pictures in H.26L.ITU-T Video Coding Ex-perts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/9910.Red/q15i08.doc

[20]S.Kondo,S.Kadono,and N.Schlockermann.(2001)New Pre-diction Method to Improve B-Picture Coding Efficiency.ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0112_Pat/VCEG-O26.doc

[21]K.Lillevold.(2000)Improved Direct Mode for B Pictures in TML.

ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0008_Por/q15k44.doc

[22]M.Flierl,T.Wiegand,and B.Girod,“Multihypothesis pictures for

H.26L,”in Proc.IEEE Int.Conf.Image Processing ,vol.3,Thessa-loniki,Greece,Oct.2001,pp.526–529.

[23] D.Marpe,C.Bl?ttermann,and T.Wiegand.(2001)Adaptive Codes

for H.26L.ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0101_Eib/VCEG-L13.doc [24] D.Marpe,G.Bl?ttermann,G.Heising,and T.Wiegand.(2001)Further

Results for CABAC Entropy Coding Scheme.ITU-T Video Coding Ex-perts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0104_Aus/VCEG-M59.doc

[25]T.Stockhammer and T.Oelbaum.(2001)Coding Results for CABAC

Entropy Coding Scheme.ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0104_Aus/VCEG-M54.doc

[26] D.Marpe,“Context-adaptive binary coding for H.264/A VC,”IEEE

Trans.Circuits Syst.Video Technol.,vol.13,pp.620–636,July 2003.[27] B.Jeon and Y .Park.(2001)Mode Decision for B Pictures in TML-5.

ITU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-site/0101_Eib/VCEG-L10.doc

[28] C.Sullivan and T.Wiegand,“Rate-distortion optimization for video

compression,”IEEE Signal Processing Mag.,vol.15,pp.74–90,Nov.1998.

[29]T.Wiegand and B.Girod,“Lagrange multiplier selection in hybrid video

coder control,”in Proc.IEEE Int.Conf.Image Processing ,vol.3,Thes-saloniki,Greece,Oct.2001,pp.542–545.

[30]T.Wiegand,H.Schwarz,A.Joch,F.Kossentini,and G.J.Sullivan,

“Rate-constrained coder control and comparison of video coding stan-dards,”IEEE Trans.Circuits Syst.Video Technol.,vol.13,pp.688–703,July 2003.

[31]S.-W.Wu and A.Gersho,“Joint estimation of forward and backward

motion vectors for interpolative prediction of video,”IEEE Trans.Image Processing ,vol.3,pp.684–687,Sept.1994.

[32]H.Schwarz and T.Wiegand.(2001)An Improved H.26L Coder

Using Lagrangian Coder Control.JTU-T Video Coding Experts Group.[Online].Available:https://www.sodocs.net/doc/9217854453.html,/ftp/video-

site/0105_Por/HHI-RDOpt.doc

Markus Flierl (S’01)received the Dipl.-Ing.degree in electrical engineering from the University of Er-langen-Nuremberg,Germany,in 1997.

From 1999to 2001,he was a scholar with the Graduate Research Center,University of Er-langen-Nuremberg.Until December 2002,he was a visitor with the Information Systems Laboratory,Stanford University,Stanford,CA.He contributed to the ITU-T Video Coding Experts Group stan-dardization efforts.His current research interests are data compression,signal processing,and motion in

image

sequences.

Bernd Girod (M’80–SM’97–F’98)received the M.S.degree in electrical engineering from Georgia Institute of Technology,Atlanta,in 1980and the Doctoral degree (with highest honors)from Univer-sity of Hannover,Hannover,Germany,in 1987.

Until 1987,he was a member of the research staff at the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung,University of Han-nover,working on moving-image coding,human visual perception,and information theory.In 1988,he joined Massachusetts Institute of Technology,

Cambridge,first as a Visiting Scientist with the Research Laboratory of Electronics,then as an Assistant Professor of Media Technology at the Media Laboratory.From 1990to 1993,he was Professor of Computer Graphics and Technical Director of the Academy of Media Arts,Cologne,Germany,jointly appointed with the Computer Science Section of Cologne University.He was a Visiting Adjunct Professor with the Digital Signal Processing Group at Georgia Institute of Technology,Atlanta,in 1993.From 1993until 1999,he was Chaired Professor of Electrical Engineering/Telecommunications at the University of Erlangen-Nuremberg,Germany,and the Head of the Telecommunications Institute I,co-directing the Telecommunications Laboratory.He served as the Chairman of the Electrical Engineering Department from 1995to 1997,and as Director of the Center of Excellence “3-D Image Analysis and Synthesis”from 1995–1999.He was a Visiting Professor with the Information Systems Laboratory of Stanford University,Stanford,CA,during the 1997/1998academic year.Currently,he is a Professor of Electrical Engineering in the Information Systems Laboratory,Stanford University,Stanford,CA.He also holds a courtesy appointment with the Stanford Department of Computer Science.His research interests include networked multimedia systems,video signal compression,and 3-D image analysis and synthesis.As an entrepreneur,he has worked successfully with several start-up ventures as founder,investor,director,or advisor.Most notably,he has been a founder and Chief Scientist of Vivo Software,Inc.,Waltham,MA (1993–1998);after Vivo’s acquisition,since 1998,Chief Scientist of RealNetworks,Inc.(Nasdaq:RNWK);and,since 1996,an outside Director of 828,Inc.(Nasdaq:EGHT).He has authored or co-authored one major text-book and over 200book chapters,journal articles and conference papers in his field,and he holds about 20international patents.Dr.Girod has served as on the Editorial Boards or as Associate Ed-itor for several journals in his field,and is currently Area Editor for the IEEE T RANSACTIONS ON C OMMUNICATIONS ,as well as member of the Editorial Boards of the journals EURASIP S IGNAL P ROCESSING ,the IEEE S IGNAL P ROCESSING M AGAZINE ,and the ACM M OBILE C OMPUTING AND C OMMUNICATION R EVIEW .He has chaired the 1990SPIE conference on “Sensing and Reconstruction of Three-Dimensional Objects and Scenes”in Santa Clara,CA,and the German Multimedia Conferences in Munich,Germany,in 1993and 1994,and has served as Tutorial Chair of ICASSP-97in Munich and as General Chair of the 1998IEEE Image and Multidimensional Signal Processing Workshop in Alpbach,Austria.He has been the Tutorial Chair of ICIP-2000in Vancouver and the General Chair of the Visual Communication and Image Processing Conference (VCIP)in San Jose,CA,in 2001.He has been a member of the IEEE Image and Multidimensional Signal Processing Committee from 1989to 1997and was elected Fellow of the IEEE in 1998“for his contributions to the theory and practice of video communications.”He was named “Distinguished Lecturer”for the year 2002by the IEEE Signal Processing Society.

相关主题