搜档网
当前位置:搜档网 › H.264avc data partitioning for mobile video communication

H.264avc data partitioning for mobile video communication

H.264/A VC DATA PARTITIONING FOR MOBILE VIDEO COMMUNICATION

Thomas Stockhammer?

Institute for Communications Engineering(LNT) Munich University of Technology(TUM)

80290Munich,Germany

Maja Bystrom?

ECE Department

Boston University Boston,MA02215,USA

ABSTRACT

In this work we compare non-scalable video coding with data par-titioning using H.264/A VC under similar application and channel constraints for conversational applications over mobile channels. For both systems optimized rate allocation and network feedback has been applied.From the experimental results it is observed that based on the average PSNR the non-scalable system outperforms the data partitioning system.However,with the data partitioning system the percentage of entirely lost frames can be lowered,and the probability of poor quality decoded video can be reduced.

1.INTRODUCTION

Applications such as video telephony,video conferencing,multi-media streaming,or multimedia messaging to wireless clients will be important features in emerging2.5G,3G,and future mobile systems and may be a key factor to their success.The challenges involved in wireless video are manifold,e.g.,to specify proper video coding techniques,to design networks appropriately,to ap-ply suitable error protection and transmission schemes,as well as to limit encoding and decoding complexity.Conventional video coding and transmission systems based on H.264[1]usually en-code and transmit frames sequentially,each in one transmission packet.However,this single layer system exhibits the same draw-back as any other non-scalable system,namely,that either the entire frame is decoded or is lost.Several methods to combat this problem have been proposed for H.264/A VC[2,3],e.g.,slice structured coding or even more advanced concepts such as?exi-ble macroblock ordering.These methods might be useful in sys-tems where packets are lost with equal probability such as in case of best-effort Internet or wireless systems without prioritization. However,the limited intra frame prediction,increased packet over-head,and possibly annoying subjective results due to block arti-facts,limit the applicability of these methods[3].In contrast,if the underlying network can support different priorities,quality–scalable source coding can provide performance gains by assign-ing higher priority to more important layers.Although scalable video coding methods usually provide high?exibility,they also suffer from reduced coding ef?ciency at least to date.For exam-ple,in[4]it was shown that a system applying progressively coded sources and advanced unequal error protection(UEP)cannot pro-vide any gains when compared to the single layer performance with equal error protection(EEP),mainly due to the fact that the performance of the applied?ne granular scalable video codec is inferior to the single layer H.264performance.

?e-mail:stockhammer@ei.tum.de,Tel:+498928923474

?e-mail:bystrom@https://www.sodocs.net/doc/5e3525773.html,,Tel.+16173536521

Therefore,in this work we are interested in a scalable coder which performs as well as H.264single layer codec in terms of rate-distortion performance to be used for mobile conversational video applications.Data partitioning in H.264provides both prop-erties,almost identical rate-distortion performance as well as at least some degree of scalability.Therefore,we will present the transmission system under consideration,discuss an optimized rate–allocation process,and?nally present experimental results com-paring a single layer system with EEP with data partitioning and UEP.

2.SYSTEM OVERVIEW

2.1.Channel Coding for Mobile Channels

Mobile channels are usually constrained in transmit power and available transmission bandwidth.In addition,highly time-varying behavior in terms of receive power due to short-term fading ef-fects and interference is experienced.In[5]a block-fading addi-tive white Gaussian noise(BF-AWGN)channel with perfect chan-nel state information at the receiver is introduced as an appropriate model for many mobile channels.In the remainder of this work we assume that the transmitter has knowledge of the channel statistics, i.e.,the distribution of the channel gain,and the average signal-to-noise ratio(SNR).The propagation channel is assumed to be slowly time-varying and frequency-?at for each time slot.In par-ticular,the channel gain is assumed to be constant over the en-tire radio slot and iid Rayleigh.We assume that we can access f s=40radio slots per second,each with N s binary channel sym-bols resulting from a binary modulation scheme.We assume that the resulting total bit-rate r t=f s·N s is variable from64kbps to 160kbps by changing the length of N s.This can be motivated by common multi-slot extensions in2.5G systems such as GPRS[6].

Typically,interleaving is performed after channel coding;this spreads the channel encoded word over S radio slots,reducing the variability of the channel,but introducing additional delay.For our application we propose a?exible channel code providing different channel coding rates.For example,codes such as rate compati-ble punctured convolutional or Turbo codes could be used(see[7] or[8]).However,we apply high memory punctured convolutional codes(memory96,mother code rate1/7,puncturing period32) with the Far End Error Decoder(FEED)as presented in[9]for scalable image transmission and in[10]for progressive texture video coding due to the?exibility and the excellent performance. Then,channel coding rates r can be selected from the discrete set R containing the discrete channel coding rates32/(32+j)with j=0,1,2,...,214.This constraint results from the the mother code and the puncturing period of our convolutional code.

code rate in bit

o u t a g e p r o b a b t y P o u Fig.1.Outage probability p out (r )versus channel coding rate r for different interleaving depths S .

Sequential channel decoding with inherent error detection [9]maps the wireless channel into a perfect packet erasure channel.The packet loss probability depends on the applied channel coding rate r according to the outage probability p out (r )as well as the interleaver depth S .In [10]it was shown that the performance of high-memory convolutional codes in combination with sequential decoding can be well modelled by bounding techniques based on the cutoff rate.For reasons of conciseness we do not address any channel coding simulation in this work.The effects of channel coding and interleaving based on the cutoff rate bounds over dif-ferent numbers of radio slots are shown in Figure 1:For an average SNR of 4dB,only one radio slot,S =1,and channel coding rate 0.5,the outage probability p out is 0.3,whereas for increased num-ber of radio slots S =4and S =16,the variance of the channel can be reduced and the outage probability decreases at the expense of additional delay.Current systems typically apply S =4as a compromise between variability and delay resulting in a still sig-ni?cantly varying outage probability over the applied channel cod-ing rate r .Since the channel coding rate r is directly proportional to the maximum supported application bitrate,a trade-off between residual losses and this bitrate is necessary.2.2.System Description

The system presented and investigated below relies on the fact that not all data is lost for poor channels;instead,the most important can be decoded and displayed.This requires that the most im-portant data being protected more strongly against severe channel impairments than would be the less important data;resulting in the typical unequal error protection schemes.

H.264supports the partitioning of I-slices in two partitions,P-slices in three partitions,and B-slices in a single partition.We focus on P-slices,as they can be viewed as a superset of the other slice types.The partitioning is organized such that different syn-tax elements are assigned to different partitions.Without going into details,partition A contains all control and header information as well as any data related to the motion compensation process.Whereas data partitioning in previous standards such as MPEG-4and H.263version 2only distinguishes two partitions,in H.264the second partition is further split into intra-related information assigned to partition B and inter-related information assigned to partition C.The main reason for this is that it was recognized that in error prone environments more frequent intra information is necessary to limit error propagation and that this information is in general more important -especially for the subjective quality -than the inter information.For more details on data partitioning

we refer the reader to [2].

The system is presented in detail in Figure 2.The video encod-ing process is commonly based on a sequential encoding of frames n =1,...,N with syntax elements being distributed among the partitions.Each partition is separately entropy coded resulting in generally three partitions with partition sizes b (q ) {b A (q ),b B (q ),b C (q )}where the partition size depends on the applied quantization parameter q .In H.264each partition is basically trans-mitted in a separate Network Abstraction Layer (NAL)unit,but can be concatenated using compound packets as speci?ed in the draft RTP payload speci?cation for H.264video resulting in a partly ”embedded”bit–stream for each video frame.After appro-priate speci?cation of channel coding rates r {r A ,r B ,r C }∈R 3to be discussed in detail later,the encoded video frame is now unequally protected,interleaved,and transmitted over the wireless channel.The receiver decodes the received channel code word us-ing the FEED [9]algorithm,which allows inherent error detection and shortening of code words.Then,bit-stream is depacketized such that only correct partitions are forwarded to the decoder.

Motion Estimator

as we aim to compare different systems.Therefore,we select local coding options such as macroblock modes and motion vectors us-ing rate-distortion optimized methods[13].For each video frame to be encoded and transmitted we now have to decide which QP q to be used and which channel coding rates r={r A,r B,r C}to be applied to the different partitions constrained by the a total bitrate

as

N c(q,r) b A(q)

A

+

b B(q)

B

+

b C(q)

C

≤N t.(1)

where N t denotes the total number of bits available for a certain video frame.Due to the stringent delay constraints and assuming a constant frame rate f r,we assume that for each frame a total con-stant number of N t=r t/f r bits is accessible to be used for source and channel coding.Note that the size of the partitions b(q)is di-rectly controlled by the applied QP,q.Due to discrete coding rates speci?ed by the puncturing patterns,signalling overhead,as well as termination overhead,the true N c is slightly different than pre-sented.However,although this is handled by the implementation, we use this simple model in the discussion for sake of clarity.

The measure of interest de?ned in the optimization process is the expected distortionˉD q(r)which can be estimated using well-known equations for scalable systems as

ˉD

q(r)=p0(r)·D0+p A(r)·D A(q)+p A,C(r)·D A,C(q) +p B,C(r)·D B,C(q)+p A,B,C(r)·D A,B,C(q).(2) where the index i for both event probability p i and distortion D i denotes the correctly decoded partitions.Note that some unreason-able terms are excluded.For example decoding partition B or C without partition A cannot decrease the distortion.The encoder is able to estimate the distortions D i by applying the appropriate error concealment as discussed previously.Note also that these distortion terms depend on q except for D0,the distortion in the case of loss of the entire frame.

The probability that a certain event occurs clearly depends on the applied channel coding rates r i for different partitions and the resulting outage probabilities p out(r i).Due to the dependency of partitions B and C on A,it is obvious that the only reasonable channel code rate vectors,r,must ful?ll r A≤r B and r A≤r C. Although we have supposed that in general partition B is more important than partition C,it is not obvious that this is always the case,since partition B and C can be decoded independently.There-fore,we consider two cases,namely r B≤r C and r B>r C.Due to the interleaving and the access to the same channel realization for all channel encoded partitions it is obvious that if a partition with channel coding rate r i cannot be decoded,any partition with r j≥r i cannot be decoded either.With these preliminaries,the event probabilities result in

r B≤r C r B>r C

p0(r)p out(r A)p out(r A)

p A(r)p out(r B)?p out(r A)p out(r C)?p out(r A)

p A,B(r)0p out(r B)?p out(r C)

p A,C(r)p out(r C)?p out(r B)0

p A,B,C(r)1?p out(r C)1?p out(r B).

(3)

The encoder then selects the QP q opt and the channel coding rates r opt such that the expected distortion is minimized,i.e.,

{q opt,r opt}=arg min

{q∈Q,r∈R3}ˉD

q(r)(4)

subject to N c(q opt,r opt)≤N t.

Although the search may be reduced to a linear or quadratic complexity order,a brute force search strategy is feasible by ap-plying some properties to speed up the search.Basically,for each q∈Q,the optimal combination of channel coding rate r opt(q)is sought,excluding impossible combinations.In addition,it is as-sumed thatˉD q(r opt(q))has only one global minimum over q,and, that for given r A and r B the smallest r C∈R is used which ful?lls the rate constraint in(1).Though the complexity of this optimiza-tion process is managable,we currently investigate less complex optimization schemes including rate control.

3.EXPERIMENTAL RESULTS

3.1.Simulation Environment

In this work we have used a modi?ed version of the test model coder JM1.7,which is provided by the of the Joint Video Team (JVT).Although JM1.7is not the latest available test software,the coding ef?ciency is very close to the latest draft software available with respect to compression ef?ciency[1].JM1.7is chosen be-cause it supports data partitioning,error resilience,and feedback methods with multiple reference frames.Speci?cally,no slices have been used,for entropy coding UVLC was applied,and con-strained intra prediction has been turned on.We compare the data partitioning system to a system with just a single packet per video frame encoded as a single slice packet.Obviously,only one chan-nel coding rate is applied over the packet resulting in an equal error protection(EEP)approach.This system can be derived as a sub-set of the data partitioning system assuming that the entire data is transmitted in partition A,whereas partition B and C do not con-tain any data.

For both systems,the?rst10seconds of QCIF sequences, Foreman and Carphone,have been encoded at a frame rate f r= 10Hz applying an IPPP...structure.To obtain suf?cient statistics, for each experiment the sequence has been repeated at least60 times resulting in6000encoded,transmitted,and decoded video frames.The rate allocation process aimed to maximize the ex-pected PSNR.The reported average PSNR is the arithmetic mean over the PSNR of each decoded or concealed frame.The bitrate re?ects the overall bitrate r t including channel coding and source bitrate.

3.2.Simulation Results

In Figure3the average PSNR versus total bit-rate for the Car-phone sequence with?ve reference frames,single layer codec with EEP,data partitioning with UEP,and different feedback delays are shown.For all cases,performance increases with bitrate,and de-creases with feedback delay.For example,about30%additional bit-rate is necessary to obtain the same performance with feedback delay1as with delay0.However,most interesting is obviously the comparison of the single layer system with the data partitioning system.It can be observed that data partitioning in this environ-ment cannot provide any gains in terms of average PSNR when compared to the single layer system.For low bitrates,the single layer system is even slightly superior than data partitioning.Al-most identical results have been obtained for different sequences such as Foreman.

To understand this behavior,in Figure4the cumulative dis-tribution of the PSNR is plotted for?ve reference frames,feed-back delay1,and r t=128kbit/s for both systems and two se-quences.Both sequences exhibit the same characteristics,with

Total Rate in kbit/s

P S N R n d B

Fig.3.Average PSNR over total bit-rate for Carphone sequence with ?ve reference frames,single layer codec with EEP,data par-titioning with UEP,and different feedback delays.

Foreman emphasizing this more:While there is higher probabil-ity of obtaining higher PSNR in the single-layer case,there is a lower probability of poor decoded quality in the data partition-ing scheme.That is,on average we expect the data-partitioning scheme to have lower quality,but the single-layer case can result in very poor video quality more often.

Fig.4.Cumulative distribution of PSNR for ?ve reference frames,

feedback delay 1,and r t =128kbit/s.

The reasons for these two phenomena are investigated in Fig-ure 5in which the error rate and the average QP are plotted versus the total bit-rate.According to Figure 5(a),the error rate of data partition A is lower than for a single slice packet in case of sin-gle layer transmission.Therefore,we obviously avoid very poor images,i.e.,images skipped due to previous-frame concealment.The channel coding rate assigned to partition C is high,result-ing high loss probabilities,whereas the partition B’s loss rate is in between that of A and C.The high loss rate of partition C ob-viously degrades the performance of data partitioning compared with the single layer system.In addition,since the rate allocation algorithm assigns stronger protection to the B-partition than to the C-partition,the separation of intra and inter information in H.264data

partitioning seems to be reasonable.

The second phenomenon,that is,that the single layer case will typically have higher PSNR,is becomes obvious from Figure 5(b):The average QP is slightly higher in case of data partitioning than for the single layer case resulting in lower encoding PSNR.The lower QP in case of data partitioning has mainly two reasons:First,a slightly higher overhead is necessary for partition header sig-nalling,etc.In addition,in the case of data partitioning,using error concealed frames in the encoder prediction results in poor reference signals yielding a higher bitrate for the same QP.Similar effects have previously been recognized for slice structured coding in wireless environments [3].

Fig.5.Average error rate and average QP over total bit-rate for ?ve reference frames,feedback delay 1,Carphone.

4.CONCLUSIONS

We have compared non-scalable video coding with data partition-ing using H.264/A VC under similar application and channel con-straints for conversational applications over wireless channels.For both systems optimized rate allocation and network feedback has been applied.From the experimental results it is observed that,based on the average PSNR,the non-scalable system outperforms the data partitioning system.However,with the data partitioning system the percentage of entirely lost frames can be lowered,and,the probability of decoding poor video quality is reduced.Further investigations are necessary for systems with data partitioning,but without feedback.In these cases error propagation can generally not be avoided and more frequent intra updates are necessary re-sulting in larger B-partitions with even higher importance.

5.REFERENCES

[1] A.Luthra,G.J.Sullivan,and T.Wiegand,Eds.,Special Issue on the

H.264/AVC Video Coding Standard ,vol.13,July 2003.[2]S.Wenger,“H.264/A VC over IP,”IEEE Trans.on Circuits Syst.Video Tech-nol.,vol.13,no.7,pp.645–656,July 2003.[3]T.Stockhammer,M.M.Hannuksela,and T.Wiegand,“H.264/A VC in wireless

environments,”IEEE Trans.on Circuits Syst.Video Technol.,vol.13,no.7,pp.657–673,July 2003.[4]T.Stockhammer,“Is ?ne–granular scalable video coding bene?cial for wire-less video applications?,”in IEEE ICME ,Baltimore,MD,USA,July 2003.[5] E.Biglieri,J.Proakis,and S.Shamai (Shitz),“Fading channels:Information-theoretic and communication aspects,”IEEE Trans.Inf.Theory ,vol.44,no.6,pp.2619–2692,Oct.1998.[6] B.Walke and G.Brasche,“Concepts,services,and protocols of the new GSM

phase 2+general packet radio service,”IEEE Communications Magazine ,pp.94–104,Aug.1997.[7]J.Hagenauer,“Rate–compatible punctured convolutional codes (RCPC codes)

and their applications,”IEEE https://www.sodocs.net/doc/5e3525773.html,m.,vol.36,no.4,pp.389–400,Apr.1988.[8] D.N.Rowitch and https://www.sodocs.net/doc/5e3525773.html,stein,“On the performance of hybrid FEC/ARQ

systems using rate compatable punctured turbo (rcpt)codes,”IEEE https://www.sodocs.net/doc/5e3525773.html,m.,vol.48,no.6,pp.948–959,June 2000.[9] C.Wei?,T.Stockhammer,and J.Hagenauer,“The far end error decoder with

application to image transmission,”in Proc.IEEE Globecom ,San Antonio,TX,USA,Nov.2001.[10]T.Stockhammer,H.Jenkaˇc ,and C.Wei?,“Feedback and error protection

strategies for wireless video transmission,”IEEE Trans.on Circuits Syst.Video Technol.,vol.12,no.6,pp.465–482,July 2002.[11]R.Zhang,S.L.Regunthan,and K.Rose,“Video coding with optimal

inter/intra-mode switching for packet loss resilience,”IEEE Journal on Se-lected Areas in Communications ,vol.18,no.6,pp.966–976,June 2000.[12] B.Girod and N.F¨a rber,“Feedback-based error control for mobile video trans-mission,”Proceeding of the IEEE ,vol.97,pp.1707–1723,Oct.1999.[13]G.J.Sullivan and T.Wiegand,“Rate-distortion optimization for video com-pression,”IEEE Signal Processing Mag.,vol.15,no.6,pp.74–90,Nov.1998.

相关主题