搜档网
当前位置:搜档网 › CONFIDENCE INTERVAL ESTIMATION IN HEAVY-TAILED QUEUES USING CONTROL VARIATES AND BOOTSTRAP

CONFIDENCE INTERVAL ESTIMATION IN HEAVY-TAILED QUEUES USING CONTROL VARIATES AND BOOTSTRAP

CONFIDENCE INTERVAL ESTIMATION IN HEAVY-TAILED QUEUES USING CONTROL VARIATES AND BOOTSTRAP
CONFIDENCE INTERVAL ESTIMATION IN HEAVY-TAILED QUEUES USING CONTROL VARIATES AND BOOTSTRAP

Proceedings of the2004Winter Simulation Conference

R.G.Ingalls,M.D.Rossetti,J.S.Smith,and B.A.Peters,eds.

CONFIDENCE INTER V AL ESTIMATION IN HEA VY-TAILED QUEUES USING CONTROL

V ARIATES AND BOOTSTRAP

Pablo Jes′u s Argibay-Losada

Andr′e s Su′a rez-Gonz′a lez

C′a ndido L′o pez-Garc′?a

Ra′u l Fernando Rodr′?guez-Rubio

Jos′e Carlos L′o pez-Ardao

Departmento de Enxener′?a Telem′a tica

ETSE de Telecomunicaci′o n

Universidade de Vigo

36200Vigo,SPAIN

ABSTRACT

The heavy-tailed condition of a random variable can cause di?culties in the estimation of parameters and their con?dence intervals from simulations,specially if the variance of the random variable we are studying is in?nite.If we use a standard method to obtain con?dence intervals under such circumstances we shall typically get inaccurate results.To face up this problem, and trying to contribute to?nd accurate con?dence interval estimation methods for such cases,in this paper we propose the use of a control variate method combined with a bootstrap based con?dence interval computation. The control variate approach is doubly interesting to address the problem of in?nite variance.We tested this approach in a M/P/1queue system with in?nite variance in the queue waiting time and got quite accurate results. 1INTRODUCTION

Heavy-tailed distributions and distributions with in?-nite variance play an important role in the modeling of several variables in communication networks.In the lit-erature we can?nd good references relating these special characteristics[2]to several magnitudes like the size of the?les downloaded from HTTP or FTP servers[3][4], the duration of sessions[5],or even to certain charac-teristics exhibited by human-computer interactions[6] [7].In fact,in[8]Paxson shows that the presence of heavy-tailed distributions is an invariant in the internet.

Moreover,network engineering is not the only im-portant?eld where heavy-tailed distributions have a considerable practical relevance:many?nancial tasks also use them in models regarding?nancial and insur-ance risks[9].

So it should be quite clear how important is to consider that kind of random variables in simulation,as simulation is one of the most powerful tools at time to make performance studies within such engineering and economic areas.But the use of random variables with those characteristics leads to important problems when trying to analyze the results of simulations.

In an M/P/1queue system,Gross et al.[10]describe problems regarding the estimation of the mean queue waiting time.Fischer et al.[11],Chen[12]and Sees and Shortle[13]study the estimation of quantiles in the presence of the heavy-tail condition.

Argibay et al.[14]study the use of a control variate (CV)to help in the estimation of the mean queue waiting time of the M/P/1,improving both the estimated mean and its con?dence intervals(CIs)when the coe?cient of the CV method is calculated beforehand from the classical queueing theory.

Our objective is to?nd an accurate method to esti-mate the con?dence intervals for the mean queue waiting time when a?ected by the heavy-tailed behavior of the service time but thinking in its usefulness in a more generic scenario(G/P/1).In this paper we extend the work in[14]but now calculating the coe?cient of the CV method from the simulation data itself combined with some bootstrap-based con?dence interval estima-

tion techniques.Nevertheless we use the M/P/1queue system in order to validate our results and show that our method achieves accurate con?dence intervals for the mean queue waiting time.

In Section2,we describe mathematically the system queue under study,the M/P/1.In Section3we describe the problems related to the accuracy of estimators of CIs for the mean queue waiting time in that queue.In Section4we present the approach we propose to try to obtain better results by means of a control variate that help us with the problem of in?nite variance of the estimator.In Section5we describe the method we are going to use for the construction of con?dence intervals,based on bootstrap percentiles,and in Section 6we describe the results of applying that method to the M/P/1queue,achieving con?dence intervals with good coverage.Finally,in Section7we describe some conclusions and further work.

2THE M/P/1QUEUE

The M/P/1is the queue system we are going to work with.Customers arrive according to a Poisson process, and demand independent and identically distributed (iid)service times which follow a Pareto distribution. The queue discipline is“?rst come?rst served”and the queue capacity is in?nite.We will work with the stochastic process of the consecutive customers’waiting times,W={W j;j=1,2,...}.

Since the M/P/1is a special case of the M/G/1, we can use the Pollaczek-Khinchin formula,that gives us the mean queue waiting time:

S2

x a?x≥m>0

In[10]a Pareto distribution—with m=1and shifted to0—is used in an M/P/1to illustrate the problems of simulating such system when a is near2. Speci?cally,when a is in(2,3)the variance of W is in?nite.In this paper we also?x m to1to show the bene?ts of our proposed method in a similar scenario.We will also?xρto0.5to minimize the e?ects of the transient state in the simulations.

In not heavy-tailed distributions—like the expo-nential or the normal ones—the probability that the random variable takes a great value is so negligible that if we do not consider those values to calculate some moments of the distribution,we will still get a pretty good estimation of them.This can be the case of the mean,the second moment,and as a consequence,the variance.

The Pareto distribution is a particular example of heavy-tailed distributions.Nevertheless,in the case of heavy-tailed distributions,the probability of such large values,although still being small,is enough to make them have a great in?uence in some important parameters of the distribution.If those parameters are being estimated through simulation,the problem arises because such not negligible probability is paralelly not so signi?cant to be likely for such large values to appear even in a long simulation;and so the lack of those unlikely samples could?nally a?ect drastically the results.

This e?ect,and its implications in the simulation of queueing systems,will be discussed in the next section.

3PROBLEMS WHEN SIMULATING THE M/P/1

We want to estimate a con?dence interval for the mean queue waiting time of the M/P/1.

The classical theory of construction of CIs assumes independent and identically distributed samples from a distribution with?nite mean and variance.

But if in the M/P/1the shape parameter of the Pareto,a,is smaller than3,the variance of W will be in?nite.This will imply that we cannot use the central limit theorem to give a CI for

1?F W(x)+F W(?x)

→p

F W(?x)

α

x?αL(x)

where f(x)~g(x)means lim x→∞f(x)

1?ρ

Pr(S e>x)

with S e being the RV associated to the equilib-rium distribution(residual life)of the service time,which in our case is the residual life of the customers:

F S

e (x)=

1

S

x

(1?F S(y))d y

For the Pareto,

F S

e

(x)= a?1a1

1?ρm a?1

x a?1

and the last expression is of the form K·x?α·

L(x),withα=a?1

As both conditions are met,the average of iid steady state samples of size1of W,when appropriately normal-ized,tends to anα-stable distribution with mean zero. It might be possible to use the convergence to a stable distribution to make inferences for

W[m]=

m

j=1W j

W when m tends to in?nite.

If we are working with an M/G/1queue,it is possible to use another approach to obtain iid samples from the marginal probability density function(pdf)of W.This can be a valuable tool to compare methods that make inferences about W—as can be the case of a con?dence interval for

W. If we apply the normal approximation to the samples of an in?nite variance RV,the results are bad behaved.In Figure1we can see what happens if we apply the normal approximation to the averages obtained from several simulations of an M/P/1queue withρ=0.5and Pareto demanded service time with a=2.1and m=1.We have obtained1000CIs,each one constructed from n=50 averages individually calculated—after the respective simulation—over a m=10000length sample of W. For the sake of clarity,we have represented only100 randomly chosen intervals.We can see that,although we have run a considerable number of simulations,the results have poor accuracy.The empirical coverage is only0.717.Notice the logarithmic scale in the Y axis. Many intervals have enormous values and many others do not include the theoretical value.

The root of our problems is the fact that the sum of iid samples of

0.001

0.01 0.1 1 10 100 1000

10000 100000 1e+06 0 10 20 30 40 50 60 70 80 90 100

C o n f i d e n c e i n t e r v a l

Confidence intervals

theoretical E[W]

Figure 1:95%con?dence intervals using normal ap-proximation.

of this RV.Instead,it tends to a stable distribution.In this case,few methods to construct CIs are available.

The bootstrap is a method to calculate a functional of a distribution function,as is the case of a CI.It was proposed by Efron [21]and it is based on the general idea that the relationship between a sample space and a speci?c group of random samples from it,closely resembles the relationship between that group of samples and a resample of them.So,analyzing that subsample and the group of samples,we can extract information about the sample space.The bootstrap,as originally proposed,can fail when the underlying distribution has in?nite variance.But it was realized [15]that,when the parameter for which we want to estimate the CI is the mean,the bootstrap can be modi?ed so that it works even in that case.If the samples are in the domain of attraction of an α-stable law with 1<α<2,it su?ces to take a resample size,s n ,such that lim n →∞s n

n

=0[16].

4

CONTROL V ARIATE TO ADDRESS

INFINITE V ARIANCE

Since the in?nite variance of the queue waiting time appears to be an important cause of the problems related to the CI estimation,we are going to use a variance reduction technique to check its appropriateness in the M/P/1case.Speci?cally,we are going to use a control variate (CV)method.

The main idea for our control variate selection is an empirical hypothesis interpretation of Equation (1):a long enough simulation run will produce a value of S 2)by the empirical ones

(average arrival rate,frequency of the “resource busy at arrival time”event,and average square service time).

In [20]we have already evaluated positively an in-ternal RV of mean value 1/(1?ρ)for polling systems with constant service times.

In the case of the M/P/1queue with service time RV of in?nite variance,our working hypothesis is that the departures of W in long enough simulation runs will be mainly due to the de-partures of

m

from its mean value

W ,we will run n simulations

—each one of size m —and compute,for each one,their average queue waiting time,

S 2[m ].We will obtain

samples of a new RV,T ,from

T i =

S 2[m ]i ?

S 2is the theoretical mean value of the square service time RV,which is known.

According to the theory of control variates,when both W and S 2have ?nite variance,the optimum value for c is [26]

c opt =

Cov(W,S 2)

c[n]= n i=1(W[n])·(S2[n])

S2[m]i?

W[n]=

n i=1n

S2[m]i

W.This will be discussed in the next section.

5METHOD FOR THE ESTIMATION OF CONFIDENCE INTER V ALS

We have to choose some method to estimate a CI for

T?[n]?

σT?[n]

whereσ2T?[n]is the sample variance of the resample.

This method is going to construct a new CI over the transformed samples taking into account that the limit distribution of this estimator is unknown.We are going to suppose that this tends to a limit distribu-tion,and therefore we are going to estimate it through bootstrapping the pairs S2[m] .

Implicitly,we are using the hypothesis that the resulting controlled RV has?nite variance,so we can use the standard bootstrap.The method consists of:

1.We perform n independent simulations of the

M/P/1from empty state.In each one we aver-

age the waiting times and square service times of

the customers,obtaining n pairs(S2[m]).

2.We compute(3),the average and sample vari-

ance of that transformed sample,

W[m],

T?[n]i,and the sample variance of the resample,σT?[n]2i.We compute

the estimator R over the resample and store it

in an array,called M.

4.We sort the array M in growing order.

5.The95%CI will be T?[n]?

σT?[n]

|0.95·σT[n],where|T[n]

W[m]in the M/P/1. We have also applied our proposed method and the normal approximation over the latter group of samples.

The utilization factor is set to0.5.We have tested the coverage properties of1000computed95%CIs and obtained their average length and the coe?cient of vari-ation of their length.We have tested the methods for several values of a for the Pareto service time;each a was tested with two sample sizes,n=64and n=1024; and the relation between the resample/subsample size, s,and the sample size,n,is s=n2

Regarding the method we propose,we can see in

Table2that the coverages are good for every a,inde-

pendently of the possible in?nite variance of W—a≤3.

Even if the shape parameter is so close to2that

W.

We have shown that the joint use of control variates and

adequately chosen inference methods gives good results

in a specially problematic case,the M/P/1queue.It

seems promising in order to apply it to more complicated

queues,like G/G/1.

REFERENCES

[1]Willinger,W.,Vern Paxson and Murad S.Taqqu.1998.

Self-Similarity and Heavy Tails:Structural Modeling

of Network Tra?c.A Practical Guide to Heavy Tails:

Statistical Techniques and Applications.Springer

Verlag.

[2]Fowler,T.B.1999.A Short Tutorial on Fractals and

Internet Tra?c.The Telecommunications Review

10,1-14,Mitretek Systems,McLean,VA.

[3]Pitkow,James E.1999.Summary of WWW charac-

terizations.World Wide Web Journal,Vol.2,No.

1,pp.3-13,1999.

[4]Crovella,Mark E.and Azer Bestavros.1997Self-

Similarity in World Wide Web Tra?c:Evidence

and Possible Causes.IEEE/ACM Transactions on

Networking,vol.5,no.6,December1997.Pages

835-846.

[5]Crovella,Mark E.,Murad Taqqu and Azer Bestavros

Heavy-Tailed Probability Distributions in the World

Wide Web.A Practical Guide to Heavy Tails:Sta-

tistical Techniques and Applications,by Robert J.

Adler,Raisa E.Feldman and Murad S.Taqqu(Ed-

itors).Springer Verlag(December1998).

[6]Mah,Bruce A.1997An Empirical Model of HTTP

Network Tra?c.Proceedings of INFOCOM97,April

7-11,1997.

[7]Zipf,G.K.1949.Human Behavior and the Principle

of Least-E?ort.Addison-Wesley Press.

[8]Paxson,Vern.1999.Why Analyzing the Internet is

Painfully Hard.Scaling Phenomena in Communica-

tion Networks Workshop.Institute for Mathematics

and its Applications.October22-24,1999.

[9]Asmussen,Soren.2000.Ruin Probability.World Sci-

enti?c Pub Co.

[10]Gross,D.,Shortle,J.F.,Fischer,M.J.,Masi,D.M.B.,

2002.Di?culties in simulating queues with pareto

service.Proceedings of the2002Winter Simulation

Conference.

[11]Fischer,Martin J.,Donald Gross,Denise M.Bevilac-

qua Masi and John https://www.sodocs.net/doc/8f14808023.html,ing Quantile

Estimates in Simulating Internet Queues with Pareto

Service Times.Proceedings of the2001Winter Sim-

ulation Conference.

[12]Chen,E.Jack.2002.Two-Phase Quantile Estima-

tion.Proceedings of the2002Winter Simulation

Conference.

[13]Sees,John C.Jr.and John F.Shortle.2002.Sim-

ulating M/G/1Queues with Heavy-Tailed Service.

Proceedings of the2002Winter Simulation Confer-

ence.

[14]Argibay-Losada,Pablo Jes′u s,Andr′e s Su′a rez-

Gonz′a lez,C′a ndido L′o pez-Garc′?a,Ra′u l Fernando

Rodr′?guez-Rubio,Jos′e Carlos L′o pez-Ardao and

Diego Teijeiro-Ruiz.2003.On the Use of Control

Variates in the Simulation of Queues with Heavy

Tailed Service.Proceedings of the2003Industrial

Simulation Conference.

[15]Athreya,K.1985.Bootstrap of the mean in the in?nite

variance case,II Technical Report86-21.Depart-

ment of Statistics,Iowa State University.

[16]Politis,Dimitris N.,Joseph P.Romano and Michael

Wolf.1999.Subsampling Springer-Verlag New York.

[17]Benes,V.E.1956.On queues with Poisson arrivals.

Annals of Mathematical Statistics.28,670-677. [18]Gross,Donald and Carl M.Harris.Fundamentals of

Queueing Theory.

[19]Kleinrock,Leonard.1975.Queueing systems,Vol.1,

page201.Wiley&Sons.

[20]Suarez Gonzalez,Andres.2000.Surez-Gonzlez,An-

drs,Cndido Lpez-Garca,Jos C.Lpez-Ardao,and

Manuel Fernndez-Veiga.2000.On the Use of Con-

trol Variates in the Simulation of Medium Access

Control Protocols Proceedings of the2000Winter

Simulation Conference.

[21]Efron,B.1979.Bootstrap methods:Another look at

the jackknife.Annals of Statistics,vol.7.Pages1-26.

[22]Hall,Peter.1992.The Bootstrap and Edgeworth Ex-

pansion,pages306-311.Springer-Verlag.

[23]Feller,William.1971.An Introduction to Probability

Theory and Its Applications.John Wiley and Sons.

[24]Politis,Dimitris N.,Joseph P.Romano.1992.A gen-

eral theory for large sample con?dence regions based

on subsamples under minimal assumptions.Techni-

cal Report299,Department of Statistics,Stanford

University.

[25]Sigman,K.1999.A primer on heavy-tailed distribu-

tions.Queueing Systems.33:261-275.

[26]Bratley,Paul,Bennet L.Fox and Linus E.Schrage.

1987.A Guide to Simulation.Springer-Verlag,2nd

edition.

AUTHOR BIOGRAPHIES

PABLO JES′US ARGIBAY-LOSADA is an as-sistant professor in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.He received a telecommunication engineering degree from Universi-dade de Vigo in2001.Nowadays,he is working to-ward his Ph.D.in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.His e-mail ad-dress is Pablo.Argibay@det.uvigo.es

ANDR′ES SU′AREZ-GONZ′ALEZ is an associate professor in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.He received a Ph.D.degree in telecommunication engineering from Universidade de Vigo in2000.He is a member of ACM and SIGSIM.His current research interests include simulation method-ology and analysis of stochastic systems.His e-mail address is Andres.Suarez@det.uvigo.es

C′ANDIDO L′OPEZ-GARC′IA is an associate pro-fessor in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.He received a Ph.D.de-gree in telecommunication engineering from Universidad Polit′e cnica de Madrid in1995.His e-mail address is Candido.Lopez@det.uvigo.es

RA′UL FERNANDO RODR′IGUEZ-RUBIO is an associate professor in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.He received a Ph.D.degree in telecommunication engineering from Universidade de Vigo in2000.His current research interests include simulation methodology and anal-ysis of stochastic systems.His e-mail address is Raul.Rodriguez@det.uvigo.es

JOS′E CARLOS L′OPEZ-ARDAO is an associate professor in the Departamento de Enxe?n er′?a Telem′a tica at Universidade de Vigo.He received a Ph.D.degree in telecommunication engineering from Universidade de Vigo in1999.His e-mail address is J.C.Ardao@det. uvigo.es

ACKNOWLEDGEMENTS

This work was supported by the project TIC2000-1126 of the“Plan Nacional de Investigacin Cient?ca,De-sarrollo e Innovacin Tecnolgica”and by the“Secretara Xeral de I+D da Xunta de Galicia”through the grant PGIDT01PX132202PN

Table1:95%CIs behavior over iid samples from the marginal pdf of W for subsampling and bootstrap.The95% CI for a nominal coverage of0.95is(0.939,0.961).Coverages below this range are shown in italycs.

a 2.001 2.01 2.1 2.534

646464646464 Coverage0.0410.1530.5110.890.950.955 Subs L/ 1.9994117056 1.75 1.13 CV L1920.78 3.8 1.920.11 Coverage0.0410.1570.5180.9020.9610.962 Boot L/ 2.6744.9191 6.46 2.1 1.33 CV L1920.78 3.75 1.890.11 Table2:95%CIs behavior over

n102410241024102410241024

0.0020.0130.1320.5870.8410.948

W9e-30.090.400.1450.025e-3

5.4 3.27 5.04 1.97 1.260.15

0.0250.1190.3280.6870.8450.945

W0.467.218.4 6.750.06 5.3e-3

17.4117.8 6.6 6.350.27

0.0280.1210.3490.7290.8880.974

W0.477.418.7 6.90.067 5.3e-3

17.21117.7 6.6 6.120.2

0.930.9550.940.9520.9570.955

W 1.78 1.90.530.020.0064e-3

18.2 2.5 1.980.530.190.09

相关主题