搜档网
当前位置:搜档网 › Applied Bayesian Model-3

Applied Bayesian Model-3

CHAPTER 3Regression Models

Methods for Bayesian estimation of the Normal linear regression model,whether with univariate or multivariate outcome,are well established.Assuming the predictors are exogenous and measured without error (i.e.not random),they may be conditioned on as fixed constants.For a univariate outcome the parameters are then the regression coefficients,linking the mean outcome for case L to predictors [L 1,[L 2,..[LS for that case,and the conditional or residual variance.With an inverse gamma prior on the variance,and conjugate Normal prior on the regression coefficients,conditional on the sampled variance,analytic formulae for the posterior densities of these coeffi-cients and other relevant quantities (e.g.predictions for new explanatory variable values)are available.These permit direct estimation with no need for repeated sam-pling.

However,generalisations which include discrete outcomes,non-linear or varying coefficient relationships,non-conjugate priors (Carlin and Polson,1991),or priors constraining coefficients to substantively expected ranges,may all be facilitated by a sampling-based approach to estimation.Similar advantages apply in assessing the density of structural quantities defined by functions of parameters and data.The Bayesian approach may also be used to benefit with regression model selection,in terms of priors adapted to screening out marginally important predictors,or in com-parisons between non-nested models (see Example 3.3).Recent methods tackle some of the identification problems (such as label-switching)in discrete mixture regression (Fru

¨hwirth-Schnatter,2001).Bayesian methods have also been proposed (see the book by Dey HW DO .,1999)for discrete outcome data which are over-dispersed in relation to standard densities such as the Poisson.

The development below is selective among the wide range of modelling issues which have been explored from a Bayes perspective,but intended to illustrate some potential benefits of the Bayes approach.The first case studies of Bayesian regression techniques include questions of predictor and model choice,for instance comparing marginal likelihood approximations with approaches with methods based on external validation (Section 3.2).Outcomes are binary,counts or univariate continuous.The focus is then on models for multiple category outcomes,which includes ordinal data,and discrete mixture regressions (Sections 3.3and 3.4).These models illustrate the central role of prior parameter specification to reflect the form of the data.As one possible P.Congdon

2003John Wiley &Sons,Ltd ISBN:0-471-48695-7

# 3.1INTRODUCTION:BAYESIAN REGRESSION

Applied Bayesian M odelling

methodology for nonlinear regression,Section 3.5discusses random walk and state space priors.A final theme (Section 3.6)is possible approaches to robust estimation in the event of departures from standard densities,and the related problem of outlier or influential observations.First,though,we consider questions around the specification of priors on regression model parameters.Among the aspects to consider are that priors express accumulated knowledge or subject matter constraints,may be defined by the form of outcome (e.g.in ordinal regression),and may be central to identification (e.g.in discrete mixture regressions).

Thus,prior constraints on parameters may be defined by subject matter as in certain econometric models.As a subject matter example,Griffith HW DO .(1993)discuss the aggregate consumption function in economics (with &consumption,and

where E 1is autonomous consumption (consumption when there is no disposable income),and E 2is the marginal propensity to consume.Economic principles suggest that of every dollar earned as disposable,some will be consumed (spent)and some will be saved.Hence,E 2should lie between 0and 1.Similarly,baseline consumption is expected to be positive,so that E 1>0.

Constraints on parameters can usually be dealt with in the prior for those parameters.Formally,let be the unconstrained parameter space of E and let 5be a constraint,expressed as

where 5is a subspace of .The posterior density of E given the constraint,outcomes \This is the posterior density that would have been obtained in the absence of a constraint,namely 3(E \,;),multiplied by a factor proportional to the conditional probability of the constraint given E .Such constraints on parameters imply that MCMC parameter samples E (W )M at iteration W may be generated from non-truncated densities,but if they violate the constraint they are rejected with probability 1.

This principle also finds expression in applications where the constraint reflects the form of the outcome.Thus for ordinal outcomes,an underlying continuous scale may be postulated on which cutpoints –which are constrained to be increasing –correspond to increasingly ranked categories.Specifically,the cumulative odds model (McCullagh,1980)is framed in terms of the cumulative probability J LM of subject L being located in the first M categories of a ranked outcome.Then logit(J LM )–or maybe some other link function of J LM –is related to a series of cutpoint parameters M on the underlying scale.The utility of Bayesian approaches to ordinal outcomes is illus-trated below in terms of applications to diagnostic tests (Example 3.9).

REGRESSION MODELS

C b 1 b 2Y

V R :b P V R

V V and regressors ;is then,assuming 3(5\)0,

j P (b j R ,y ,X ) P (b j y ,X )P (R j b ,y )=P (R j y )

j u 3.1.1Specifying priors:constraints on parameters

Specification of the prior may be less clearly defined in certain applications by the subject matter or outcome form than in Section 3.1.1,but is still often central to model identifia-bility,fidelity to the observed data,and robustness in inference.Often,the structure of residuals suggests alternatives to standard density assumptions,in terms of heavier tails than under the Normal,as well as outlier and influential observations.To model heavy tails Dickey (1976),Chib HW DO .(1990)and Geweke (1993)are among those adopting univariate or multivariate Student W densities as sampling densities for the data,or as priors for linear regression parameters.A contaminated Normal or Student W model,in which outliers have shifted location and or variances,provides one Bayesian approach to outliers (e.g.Verdinelli and Wasserman,1991).For instance,assuming a small probability such as 0.05that an outlier occurs,the density of a metric y L may take the form

where 'L is the shift in location for case L if it is selected as an outlier.One may adopt similar methods for shifts in regression coefficients.

Other options to protect against the influence of outliers or influential cases may be particular to the form of outcome.With a binary outcome,for instance,a prior may be set on a ‘transposition’,namely when \L 1is the actual observation but the regression model provides much higher support for Pr(\L 0)than for Pr(\L 1).In terms of the contaminated data approach of Copas (1988,p.243)this would be a measure of ‘the evidence that each individual observation has been misrecorded’(see Example 3.14).Discrete mixture regressions also intend to improve robustness (to assumptions of a single homogenous density across all data points),as well as providing for substantively based differences in the regression coefficients between sub-populations (Wedel HW DO .,1993).

Model identifiability,and credibility in terms of existing knowledge,may be improved by adopting an informative prior for one or more regression coefficients.Such a prior might refer to results from similar studies.For instance,if the estimate of a regression parameter E from previous study had mean E 0and covariance 60,we might inflate 60by a factor D >1(e.g.D 5)to provide a prior covariance on E in the current study (Birkes and Dodge,1993,p.156;Dellaportas and Smith,1993,p.444).More formal procedures for reflecting historical data may actually include such data in the likelihood,though down-weighted relative to the current data (Ibrahim and Chen,2000).

One might also consider some summary statistic from the observed data to inform the specification of the prior,though with suitable down-weighting,so that priors become ‘gently data determined’(Browne and Draper,2000).Note,however,that strictly this is departing from fully Bayes principles.Sometimes it is preferable to avoid completely flat priors,especially if flat priors lead to effectively improper posterior densities or poor identifiability.Thus,Fernandez and Steel (1999)show how choosing a proper Wishart prior for the conditional precision matrix 61improves identifiability of multivariate Student W errors regression.In mixture regressions (Section 3.5),some degree of prior information may be essential to identifiability.

INTRODUCTION:BAYESIAN REGRESSION

e f (y i j m ,s 2,e ) (1àe )f (y i j m ,s 2) e f (y i j m D i ,s 2)

à3.1.2Prior specification:adopting robust or informative priors

Below we consider regression applications both for metric and discrete outcomes.General Linear Models (GLMs)have been proposed as a unified structure for both types of outcomes.As elaborated by McCullagh and Nelder (1989,p.28)several discrete densities (together with the Normal and Student W densities)can be encom-passed within the exponential family.The exponential family density (Gelfand and Ghosh,2000)has the form

(3.1)

where L are scale parameters,and the means are obtained as

and variances as

Poisson and binomial densities have a fixed scale factors L 1.So for the Poisson,with E (L )exp (L ),the mean and variance are both P L .

Often,though,as also noted in Chapter 2,residual variability under these densities exceeds that expected under the postulated variance-mean relationship.The variance function remains as above,but now >1for all subjects,or possibly disper-sion factors vary by subject with L >1,and are themselves modelled as functions of covariates.

One approach to handling such heterogeneity in general linear models involves random effects modelling in the model for the mean response.Thus for a Poisson outcome,\L Poi(P L )we might stipulate a model for the mean P L containing both fixed and random effects:

where the L are parametric (e.g.Normal)or possibly semi-parametric (e.g.where a DPP prior for the L is used with a Normal baseline density).This effectively means adding a set of parameters which increase in number with the sample size,technically making the likelihood nonregular and raising the question about how many parameters are actually in the model (see Chapter 2on assessing the number of effective param-eters).

Alternative approaches to overdispersion include reparameterisations of the variance function,and generalised density and likelihood forms such as the double exponential that model both mean and variance functions (Carroll and Ruppert,1988;Efron,1986).Nelder and Pregibon (1987)propose a ‘quasi-likelihood’,which in logged form for one observation is

REGRESSION MODELS f (y i j u i ) exp {w à1i [y i u i àb (u i )] c (y i ,w i )}w E (y i ) m i b H (u i )

V (y i ) w i b HH (u i ) w i V (m i )

w w u u w w $log (m i ) b X i e i

e e à0:5D (y i ,m i )=w i à0:5log [2pw i V (m i )]

3.1.3Regression models for overdispersed discrete outcomes

where '(,)is the deviance function 1.These are quasi-likelihoods because they do not correspond to any complete likelihood 2within the exponential family (Albert and Pepple,1989).West (1985)has,however,proposed a Bayesian approach involving a scaled,and hence complete,likelihood

(3.2)

where and O reflects the level of over-dispersion.As O tends to infinity,and J L approaches 1,the scaled likelihood approaches the baseline exponential density.Various parameterisations of the variance function are possible:for instance,Engel (1992)has suggested

(3.3)where

and :L contains covariates relevant to explaining extra-variability.

Breslow (1984)and others have proposed a related approach where a pseudo-likelihood for each observation has the logged form

(3.4)

The pseudo-likelihood approach is thus equivalent to assuming a Normal outcome but with modified variance function.Ganio and Schafer (1992)and Dey HW DO .(1997)are among those adopting these forms or extensions of them,and relating the variance inflators I i to predictors;Nelder and Lee (1991)term such models joint GLMs.

Another methodology in instances of over-dispersion involves generalising the stand-ard variance function for particular densities.Thus,for a count regression with mean P L exp (E [L ),Winkelmann and Zimmerman (1991)propose a variance function

(3.5)

where >0and N 1.Then 1corresponds to the Poisson,and >1to over-dispersion.It can be seen that the particular value N 0means the Poisson variance P L is inflated by the factor .This type of ‘linear’adjustment has been proposed by McCullagh and Nelder (1989)as a correction procedure for over-dispersion.

At its simplest,a correction for linear over-dispersion involves dividing the fitted deviance or chi-square for a Poisson regression by the degrees of freedom and deriving a 1For example,in the Poisson the deviance for case L is

while in the binomial,with \L Bin(S L ,7L )and L S L 7L ,it is

2

It may be noted that in BUGS introducing synthetic data Z to represent densities not included does not necessarily require a complete likelihood.For instance,a quasi likelihood,with variance as in (3.3)and covariate :[L ],could be coded as

for (i in 1:N ){Z[i]<-1;Z[i]dbern(q[i])

q[i]<-exp(Q[i])

Q[i]<-0.5*log(6.28*phi[i]*pow(mu[i],kappa))0.5*D[i]/phi[i]

log(phi[i])<-eta[1]eta[2]*W[i]}

with '[L ]being the relevant deviance.INTRODUCTION:BAYESIAN REGRESSION

f (y i j m i ,f i )

g 0:5i exp [àg i D (y i ,m i )]g i l =(l f i ),V (y i ) w i V (m i ) w i m k i

log (w i ) Z W i

à0:5(y i àm i )2=[w i V (m i )]à0:5log [2pw i V (m i )] V (y i ) (v à1)m k 1i m i v !àv v v 2{y i log (y i =m i )à(y i àm i )}

$m 2{y i log (y i =m i ) (T i ày i )log ([T i ày i ]=[T i àm i ])

$àà

scaling factor (i.e.estimating when N 0)to adjust the standard errors of the E M for over-dispersion.The scaling factor can also be obtained by a pseudo-likelihood model assuming Normal errors,with means P L ,and variances L .Another variance function with N 1occurs in the event of a specific type of gamma-Poisson mixture (as con-sidered in Chapter 2),when \L is marginally a negative binomial (see McCullagh and Nelder,1989,Chapter 11).In fact,both the linear and quadratic variance forms (with N 0and N 1,respectively)can be derived from mixing a gamma with the Poisson,but differ in the way the gamma is parameterised.

Many methods have been discussed for comparing the structural form and specification of regression models,including Bayesian methods,likelihood based methods,penalised likelihoods (Akaike,1978),supermodel methods (Atkinson,1969),bootstrap methods and cross-validation techniques (Browne,2000).Very often,model selection schemes combine elements of different basic approaches.

Model uncertainty in regression analysis may involve different aspects of model specification:

(a)the error structure of the residuals (e.g.Normal vs.Student W );

(b)whether or not transformations are applied to predictors and outcome,which may

be appropriate,for example,if metric data are skewed before transformation or to induce approximate Normality in a count outcome;

(c)which link to adopt for discrete outcomes in general linear modelling,since the

standard links are not necessarily always appropriate;

(d)whether a general additive or non-linear regression term be used or just a simple

linear expression;

(e)which is the best subset of predictors in the regression term,regardless of its form;(f)for discrete mixture regressions,there is an issue of optimal choice of the number of

components in the mixture.

Many of these questions can be addressed by the model choice techniques considered in Chapter 2,for example using approximations to the marginal likelihood,using joint model-parameter space methods,using penalised measures of fit such as the BIC or DIC or using predictive (cross-validatory)criteria.

In addition to the marginal likelihood approximations discussed in Chapter 2may be mentioned the Laplace approximation.This is useful for standard regressions,such as Poisson,logistic or Normal linear models,since it requires differentiation of the log-likelihood with regard to all model parameters,and so may be difficult for complex nonlinear or random effect models.The Laplace approximation is based on the Taylor series expansion for a T -dimensional function J ([1,[2,..[T ),such that (3.6)where is the value of [at which J is maximised,and 6is minus the inverse Hessian of J ([)at the point .The Hessian is the matrix of double differentials .For a model with parameters I ,the marginal likelihood has the form

REGRESSION MODELS

v v vm

exp [g (x )]dx %(2p )q =2j S j 0:5exp [g (^x

)]^x ^x

@2g =@x i @x j 3.2CHOICE BETWEEN REGRESSION MODELS AND SETS OF

PREDICTORS IN REGRESSION

and in this case,the Laplace approximation may be obtained at the posterior mode (though it may also be defined at the maximum likelihood point ).For a univariate regression,with I (V 2,E ),and T S 2,where S is the number of regressors (ex-cluding the constant),the approximation is therefore

In this approximation,S ()is the probability of the values in evaluated in terms of the assumed prior densities S (I ),and 6is minus the inverse Hessian of

evaluated at the posterior mean of the conditional variance and regression coeffi-cients.Then the log of the marginal likelihood is approximated as

In practice (see Raftery,1996),6may be estimated by minus the inverse Hessian of the log-likelihood at the mode,/(\)log I (\),rather than that of J (I ).An approxi-mation might also be based on the variance-covariance matrix of I based on a long MCMC run,though this may be less valid for if regression parameters have non-Normal posterior densities (Diciccio HW DO .,1997);other options include multivariate interpolation (Chib,2000).

While approaches based on the marginal likelihood have been successfully applied to regression model selection,the structure of regression models is such that distinctive methods have been proposed for some of the particular model choice questions occur-ring in regression.One of the major motivations for seeking optimal or parsimonious models occurs if there is near collinearity between predictors,;1,;2,...,;S .Taken as a single predictor,the coefficient E M for predictor ;M may have a clearly defined posterior density in line with subject matter knowledge (i.e.a 95%credible interval confined to positive values,assuming ;M was expected to have a positive effect on \).However,with several predictors operating together coefficients on particular ;M may be reduced to ‘insignificance’(in terms of a 95%interval neither clearly positive nor negative),or even taking signs opposite to expectation.

Choice among S predictors to reduce such parameter instability can be seen as a question of including or excluding each predictor,giving rise to choice schemes specify-ing priors on binary indicators J M relating to the probabilities of inclusion Pr(J M 1)or exclusion Pr(J M 0)1Pr(J M 1)of the M th predictor.Note that exclusion may be defined as ‘exclusion for all practical purposes’.

Suppose a Poisson density with mean P L is assumed for count outcome \L ,so that if all predictors are included in a log link

CHOICE BETWEEN REGRESSION MODELS

m (y )

f (y j f )p (f )d f

exp [log f (y j f ) log p (f )]d f

f ^f

m (y )%(2p )q =2j S j 0:5f (y j f

)p ( f ) f f g (f ) log f (y j f ) log p (f )

f lo

g [m (y )]%0:5q log (2p ) 0:5log j S j log f (y j f

) log p ( f ) f j j f à log (m i ) b 0 b 1X i 1 b 2X i 2 ... b p X ip

3.2.1Predictor selection

Then in a predictor selection model,we introduce between 1and S binary variables J M relating to predictors about which inclusion is uncertain (the intercept is presumed always necessary).If uncertainty is applied to all predictors,the model becomes

(3:7)

George and McCullough (1993)propose a stochastic Search Variable Selection Scheme (SVSS),whereby E M has a vague prior centred at zero (or some other value)when J M 1,but when J M 0is selected the prior is centred at zero with high precision (i.e.E M is zero for all practical purposes).When J M 0one might make 3the variance 2M small,but multiply this by a large constant F 2M when J M 1.Typically,Bernoulli priors with probability 0.5are assigned to the probability of each selection index J M being 1.An MCMC analysis would then be carried out and the frequency of different combinations of retained predictors enumerated;there are 2S possible combinations in Equation (3.7).Another scheme which takes the priors for binary indicators J M and coefficients E M as independent is presented by Kuo and Mallick (1998).Thus in Equation (3.7)one option is that the E M are assigned conventional priors,such as

with 9M large,rather than mixture priors as in the SVSS approach.This independent priors approach allows the possibility of more particular model elements than just whether predictors should be included or not.Thus for a log-linear regression of counts \LM on factors $L ,%M and their interaction &LM ,one would not generally include &LM unless both $L and %M were included.So coding the J M to reflect this,with \LM Poi(P LM ),leads to

so that J 31corresponds to including $L ,%M and &LM .Kuo and Mallick (1998,p.72)also suggest a dependent priors scheme,whereby E M N(0,9M )when J M 1,but E M N(0,N M 9M )when J M 0,such that N M 9M is close to zero.

In general,cross-validation methods involve predictions of a subset \U of cases when only the complement of \U ,denoted \[U ](i.e.the remaining observations)is used to update the prior on parameters I .Here I for a linear regression might consist of regression parameters and a variance term.Suppose an Q S matrix ;of predictors is also partitioned into ;U and ;[U ].A discussion of cross-validation principles in Bayes regression model selection is provided by Geisser and Eddy (1979).These authors suggested a marginal likelihood approximation based on leaving one observation out at a time.The links (in terms of asymptotic equivalence)between this single-case omission strategy and penalised fit measures such as the AIC –where such measures are based on fitting to the complete sample –are discussed in earlier work by Stone (1974,1977).

Consider a procedure where cases are omitted singly,and let \(U )denote the data excluding a single case U .Then the cross-validatory predictions have the form 3George and McCulloch suggest F M between 10and 100and W M M M 0.5,where G M is the largest value for E M that would be considered unimportant.One might take G M '\';M ,where '\is say under 0.5of s.d.(\)but ';M is the range of ;M .

REGRESSION MODELS

log (m i ) b 0 g 1b 1X i 1 g 2b 2X i 2 ... g p b p X ip b j $N(0,V j )

$log (m ij ) a max (g 1,g 3)b 1[A i ] max (g 2,g 3)b 2[B j ] g 3b 3[C ij ]

$ $ ?G (2log F )%=j j =t 3.2.2Cross-validation regression model assessment

with S (\U \(U ))often known as the Conditional Predictive Ordinate (CPO).A proxy for the marginal likelihood P (\)is defined by the product of these terms

The ratio of P 1(\)and P 2(\)for two models 01and 02is then a surrogate for the Bayes factor %12.Cross-validation methods also play a major role in regression model checking in identifying influential cases and other model discrepancies (see Example

3.15).Gelfand HW DO .(1992)propose several checking functions involving comparison of the actual observations with predictions from S (\U \(U )).

Rust and Schmittlein (1985)and Fornell and Rust (1989)propose a cross-validatory scheme for regression analysis including Bayesian model selection criteria,but drawing on split-half and other sample splitting methods also used in frequentist inference.Rust and Schmittlein emphasize that the choice of validation function may be based on subject matter choices (e.g.predicted vs.actual marketing share in the sales applications they consider).They consider cross-validation to compare models M 1,..-via random splitting of datasets.Let '1and '2be such a split,with '1being the pre-sample on which parameter estimation and '2is the post or validation sample,used to assess the model.Let I M denote the parameter estimate under model M and M 1the estimate of I M using just the data in '1.Then Rust and Schmittlein propose that the posterior probabilities of each model be approximated as

(3.8)Rust and Schmittlein take M 1to be the maximum likelihood estimator,but one might also take the posterior average of the parameters under model M ,M 1.Another option,given equal prior model probabilities,would involve comparing the average log-likelihoods of

the validation data (evaluated over a long run of 7iterations)under model M

Fornell and Rust consider the measure (3.8)for a single random split of the original data into sets {'1,'2},but one might consider a large number U 1,..5of such splits and carry out such comparisons by averaging over the sets {'1U ,'2U }.Thus,Alqallaf and Gustafson (2001)consider cross-validatory checks based on repeated two fold data splits into training and validation samples.Another option is N fold validation,where the data is split into a small number of groups (e.g.N 5)of roughly equal size,and cross-validation is applied to each of the N partitions obtained by leaving each CHOICE BETWEEN REGRESSION MODELS

p (y r j y (r ))

f (y r j f ,y (r ),X (r ))p (f j y (r ))d f

j m (y ) Y

n r 1p (y r j y (r ))

j ^f P (M j j D 2) P (D 2j M j )p (M j )=P (D 2) p (M j )

p (f j )f (D 2j f j )d f j =X J k 1p (M k ) p (f k )f (D 2j f k )d f k ()%f (D 2j ^f j 1)p (M j )=X

J k 1p (M k )f (D 2j ^f

k 1)(

)^f f f j T à1X

T t 1f (D 2j f (t )j 1)

group out at a time.Kuo and Peng (1999)use such an approach to obtain the predictive likelihood for the J th omitted group (the validation group),and suggest a product of these likelihoods over the N partitions as a marginal likelihood approximation.To illustrate model choice under marginal likelihood and cross-validation approaches,consider binary outcomes on prostatic cancer nodal involvement and four possible predictors (Collett,1992).These data illustrate a frequent problem in model fitting:simple unpenalised fit measures such as the deviance may show a slight gain in fit as extra predictors (or other parameter sets)are added but penalised fit measures,or marginal likelihood type measures,show that any improve-ment is offset by extra complexity.Cross-validation approaches may also favour the less complex model in such situations.

In the nodal involvement example,the predictors are [1log(serum acid phos-phate),[2result of X-ray (1ve,0ve),[3size of tumour (1large,0small)and [4pathological grade of tumour (1more serious,0less serious).A probit regression model including all predictors is then

The model may be fitted directly or by introducing latent continuous and Normally distributed responses 4as in Albert and Chib (1993).Using the latter method,Chib (1995)shows that a model with [1[3only included has a worse log likelihood than a model including all four predictors (24.43as against 23.77)but a better marginal likelihood (34.55vs.36.23).Letting 02denote the full model (four predictors)and 01the reduced model,then the approximate Bayes factor %12obtained by Chib is exp (1.68)5.37.By conventional criteria on interpreting Bayes factors (Kass and Raftery,1996),this counts as ‘positive evidence’for the reduced model,though far from conclusive.

Here alternative approaches to marginal likelihood approximation and model choice,as discussed above and in Chapter 2,are considered.As noted in Chapter 2the marginal likelihood identity

applies at any value of E ,and in particular at points such as the posterior mean E

.So (omitting 0M ),one may estimate log [P (\)]as

(3.9)4Thus,suppose ;L denotes a set of predictors,with regression parameter E .If \L is 1then the latent response ]L is constrained to be positive and sampled from a Normal with mean E ;L and variance 1.This is equivalent to assuming a probit link for the probability that \L 1.For the observed outcome \L 0,]L is sampled in the same way,but constrained to be negative (with 0as a ceiling value).The latent variable approach is especially advantageous for the probit link,and has benefits (e.g.for residual analysis).In BUGS,the following code (for observations y[]and one covariate x[])might be used

for (i in 1:n){z[i]dnorm(mu[i],1)I(low[y[i]1],high[y[i]1]);

mu[i]<-b[1]b[2]*[i]}

#sampling bounds

low[1]<-20;low[2]<-0;high[1]<-0;high[2]<-20;

A logit link is approximated by sampling ]from a Student W with 8degrees of freedom;this is equivalent to Normal sampling,as in the above code,but with the precision of 1replaced by case-specific precisions sampled from a Gamma density with shape and index both equal to 4.

REGRESSION MODELS

à y i $Bern(p i )

probit(p i ) b 0 b 1x 1i ... b 4x 4i

ààààà m (y j M j ) p (y j b ,M j )p (b j M j )=p (b j y ,M j )

log {m [y ]} log{p (y j b ) log{p (b )}àlog{p (b j y )}

$ ?àExample 3.1Nodal involvement

The main issue is then how S (E

\)is approximated.If augmented data ]is part of the model definition,and there is just one other parameter block (e.g.as here where there are latent Normal responses underlying an observed binary response,and the only other parameters are regression coefficients),then one may estimate S (E

\)using draws ](W )from a sampling run where all parameters are updated,and the probability of the fixed value E

(obtained from an earlier run)is evaluated against the full conditional density for E given ](W ).The latter density is derived essentially by considering ]as dependent variable in a Normal errors model (Chib,1995,2000).If the prior for E (E 0,E 1,..,E 4)is denoted E 1S (F 0,&10

),then the relevant full conditional is where &&0;;,with ;of dimension Q S ,and

(3.10)

with ]being the vector of sampled ]L .If &0is diagonal,then the prior reduces to a set of univariate Normals.Multivariate Normal or Student approximations to the posteriors S (E

\)may also be adopted to estimate the posterior ordinate in Equation (3.9),with covariance matrix based on correlations between the sampled E (W )N ,where N 0,1,..,4for model 2and k 0,..,3for model 1.Then S (E \)is the value of the multivariate Normal or Student density evaluated at its mean E

.Harmonic mean and importance sample estimators of the marginal likelihood (see Chapter 2)may also be considered.To examine stability in these estimators,the directly estimated probit model (without augmented data ])is fitted,and seven batches of 2000iterations are taken from a single run of 15000after excluding the first 1000(see Program 3.1(A),Model A).Priors are as in Chib (1995).The importance sample estimate is as in (2.9)(3.11)

where /(W )and S (W )are the likelihood and prior densities evaluated at iterations W 1,..,7,and J (W )is the value of an importance function intended to approximate the posterior density S (E \).This function is provided by a multivariate Normal approximation.

Table 3.1accordingly shows the sort of fluctuations that occur in the harmonic mean estimates (Equation (2.8)in Chapter 2)of the marginal likelihood.This involves monitoring the total log likelihood (/in Program A,Model 3.1A),using the coda facility to extract runs of 2000values,exponentiating /(W )to give +(W ),taking the average K of K (W )(W ),and then taking minus the log of K

.Comparison of the average marginal likelihood estimates (averaging over batches)gives %121.81.The import-ance sample estimates of the marginal likelihoods,as calculated in Equation (3.11),are 35.20(Model 1),and 36.89(Model 2),and so %125.42.These estimates of the log marginal likelihood are virtually identical to those obtained from the application of the marginal likelihood identity (3.9)in Model C of Program 3.1(A)(at posterior mean)using the MVN approximation to the posterior of E .At the posterior mean,the estimates are 35.19and 36.89for Models 1and 2.

CHOICE BETWEEN REGRESSION MODELS

j j $àN p (^b

,C à1) H ?^b C à1[C 0c 0 X H z ] j j ^m

(y ) 1=[T à1X t g (t )={L (t )p (t )}] T =[X

t g (t )={L (t )p (t )}] j 1+ = àà àà

Probit models for nodal involvement,regression coefficients

and marginal likelihood approximations

Model 1with 3

predictors

Mean St.devn. 2.50%Median 97.50%E 00.740.40 1.500.740.07E 1 1.420.670.17 1.40 2.79E 2 1.300.480.40 1.29 2.22E 3 1.080.430.25 1.08 1.92

Model 2with 4predictors

E 00.790.42 1.640.780.01E 1 1.630.700.30 1.63 3.07E 2 1.260.490.32 1.26 2.21E 30.960.450.100.96 1.84E 40.550.450.330.54 1.45

Marginal likelihood estimates by iteration batch

Harmonic Mean

CPO Method Iterations

Model 1Model 2Model 1Model 21001–3000

29.8628.8529.2029.853001–5000

28.4130.8329.1830.035001–7000

28.8928.4728.9029.887001–9000

28.5530.2329.0729.879001:11000

28.4728.6029.1529.5011000:13000

28.3329.3528.9129.7813000–1500029.2529.5829.2829.86

Program 3.1(B)follows Albert and Chib (1993),and takes the latent data ]from the distribution ]\.Then E

is evaluated against 1S (,&1)for the samples ](W )by substi-tuting in Equation (3.10).This gives a marginal likelihood estimate of 34.04for Model 1and 35.28for Model 2,a Bayes factor of 3.46;see Model B in Program 3.1(B).Model A in Program 3.1(B)illustrates the basic truncated Normal sampling needed to implement the Albert–Chib algorithm for probit regression.

The CPO estimate (Equation (2.13)in Chapter 2)is obtained by taking minus logs of the posterior means of the inverse likelihoods (the quantities *[]in Program 3.1A,Model A),and then totalling over all cases.This estimator leads to %122.06,and is more stable over batches than the harmonic mean estimate.

A Pseudo Bayes Factor (PsBF)is also provided by the Geisser–Eddy cross-validation method based on training samples of Q 1cases and prediction of the remaining case.Program 3.1(C)(Nodal Involvement,Cross-Validation)evaluates 53predictive likelihoods of cases 1,2,3,..,53based on models evaluated on cases {2,..,53},{1,3,4,..,53},{1,2,4,..,53),....{1,2,...,52},respectively.Each omitted case \L provides a predict-ive likelihood and Bayes factor under model M (where here M 1,2),based on the components

REGRESSION MODELS

àààààààààààààààààààààààààààààààààààj ^b ààà à f j (y i j b ),p j (b j y (ài ))

Table 3.1

where \(L )denotes the sets of cases omitting the L th,namely

The ratios of the predictive likelihoods for case L (when it is the omitted case)under Models 1and 2may be denoted

E [

L ]12

and their product E 12is a form of Bayes factor.Since E 12is skew,one may sample log (E 12)in an MCMC run with validation models running in parallel,and then take the exponential of the posterior mean of log (E 12)as an estimate of the PsBF.Addition-ally,if it is possible to run the Q validation models in parallel and monitor log (E 12)to find it exceeds 0with a high probability,then one may say Model 1is preferred to Model

2.For numeric stability,a logit rather than probit link is used in Program

3.1C.

From iterations 500–2000of a run of 200iterations (pooling over three chains),Pr(log (E 12)>0)converges to around 67%,while the posterior mean of log (E 12)stabil-ises at around 1.77(i.e.PsBF 125.87).One may also monitor the total predictive likelihoods involved (TL[]in Program 3.1C),and compare their posterior averages or medians to obtain an alternative estimate of %12.The mean values of these likelihoods are 32.33and 34.07,giving PsBF 126.4.Overall,the cross-validation methods provide slightly greater support for the reduced model than the approximate marginal likelihood methods do 5,but both approaches support a more parsimonious model,whereas a simple deviance comparison might lead to adopting the complex model.A data set from Hald (1952)refers to the heat evolved in calories per gram of cement,a metric outcome for Q 13cases.The outcome is related to four predictors describing the composition of the cement ingredients.These data may be used to illustrate predictor selection in regression modelling.Options include f the SSVS scheme of George and McCulloch (1993),and related approaches;f true cross-validation methods based on omitting portions of the data as validation

samples;and f predictive model assessment based on sampling ‘new data’=from a model fitted to all cases,and seeing how consistent the new data are with the observations.

Here we adopt the third approach,and undertake a simultaneous analysis of the eight models by declaring repeat versions of the outcome,and evaluate them using a predict-ive criterion derived from the work of Laud and Ibrahim (1995).As discussed in Chapter 2,for model N and associated parameter set N ,the predictive density is The predictive criterion of Laud and Ibrahim is then

5

A test of the cross-validation approach with the probit link provided similar differentials between Models 1and 2(e.g.in terms of predictive likelihoods),but is subject to occasional numerical errors when the linear regression term becomes too large.CHOICE BETWEEN REGRESSION MODELS

y 1,y 2,::y i à1,y i 1,::y n

àà u p (Z j y ,M k )

f (Z j u k )p (u k j y )d u k

C 2 X

n i 1[{E (Z i )ày i }2 var(Z i )]

Example 3.2Hald data

Better models will have smaller values of &2or its square root,&.Table 3.2compares the eight (23)possible models in terms of &.

Laud and Ibrahim employ a specialised prior,namely a guess at the fitted regression outcome for each case in terms of the available predictors;this is then translated into priors on the regression coefficients.Here non-informative N(0,105)priors are assumed on the regression coefficients themselves,and a G (1,0.001)prior for the conditional variance.Despite this,similar &measures to those reported by Laud and Ibrahim (1995,Table 1,p.255)are obtained.Taking into account variability in the criterion &,there is no overwhelming evidence against any model.

However,on the Occam’s Razor principle that a Model 1which is embedded in Model 2(i.e.is nested within it and less heavily parameterised)and also has a better average &than Model 2,we can eliminate the models {[1,[2,[3,[4}and {[2,[3,[4}.

Comparisons of the predictive criteria &L and &M between models L and M over repeated samples is carried out in Program 3.2via the step()command;thus Comp[i,j]is 1if model M has a worse criterion than model L in a particular iteration.On this basis,it can be confirmed that there is essentially no clear advantage in fit for any model in Table 3.2.For example,Comp[1,2]averages 0.51from the second half of a run of 20000iterations over three chains,so that Model 2has a worse fit than Model 1in 51%of 30000iterations.Of the 2881and 7)is 0.70and the lowest is 0.46.

Both Winkelmann and Zimmerman (1991)and Dey et al.(1997)consider the ship damage data of McCullagh and Nelder (1989,p.205).These data illustrate approaches to overdispersion in discrete outcomes,and the questions of model selection involved in introducing extra parameters to account for overdispersion.While a gamma-Poisson mixture is a standard option,this is not the only choice,and the goal is to use as parsimonious model as possible to ensure overdispersion is effectively allowed for.Hence model choice is better based on penalised measures of fit than unmodified likelihood and deviances.

Excluding null observations,the data consists of counts \L of wave damage to 34cargo ships according to ship type W L (A–E),year of construction F L (1960–64,1965–69,1970–74,1975–79),and period of observation R L (1960–74,1975–79).Most analyses

Predictive fit of models for Hald data (iterations 10000–

20000,three chains)

Model (included

predictors)

Mean &s.d.&Median &1,2,4

11.40 2.7311.041,2,3

11.45 2.7111.091,3,4

11.79 2.8211.411,2,3,4

11.85 2.8911.451,2

11.98 2.7211.651,4

13.60 3.0713.222,3,4

14.11 3.3313.683,413.57 3.0813.20

REGRESSION MODELS

72model comparisons,the highest (comparing models ?=Example 3.3Ship damage Table 3.2

treat these as categorical factors,and take the months of service total V L as an offset with coefficient 1.Thus,the model for damage counts is

First,consider a standard Poisson regression,with N(0,103)priors on the regression coefficients I {D ,E ,J ,G },and taking appropriate corner constraints on {E ,J ,G }.Unpenalised fit measures are not ideal for model choice,but do indicate whether over-dispersion is effectively corrected for.They include the mean Poisson deviance '1,and a chi-square measure based on standardised residuals,We can also calculate the deviance at and obtain the effective parameters S H ,and so obtain the DIC

Another penalised fit measure is provided by the deviance criterion (for Poisson data)under minimum predictive loss (Gelfand and Ghosh,1998),which,like the criterion used in Example 3.2,involves 6sampling new data =L from the model means P L .

The baseline Poisson regression model,with nine parameters (and 25degrees of freedom)is estimated using three chains 7taken to 100000iterations and with the summary excluding 5000burn-in iterations.This model suggests over-dispersion,albeit not pronounced,with the mean value of '1standing at 42and that of '2at

55.The deviances are highest for ships 19,20,and 31with 6,2and 7damage incidents,respectively;two of these ships (19,31)have relatively short exposure periods.The evaluation of the deviance at gives '1()33.6,so S H 8.4and '333.62(8.4)50.4.The deviance criterion under minimum predictive loss is 55.4.

The Poisson regression (see Table 3.3)shows lower damage rates for ship types B and C (type A is the first level in the categorical factor and has coefficient zero).Damage rates are higher for later construction and observation periods.

McCullagh and Nelder (1989)suggest a variance inflator (based implicitly on a linear variance model with N 0in (3.5))of 1.69,namely 1.69.To formalise this,one may adopt the Normal errors model of the pseudo-likelihood (3.4),as set out in Model B in Program 3.3.Thus the damage counts are taken as Normal with mean P L and variance

where is assigned an ((1)prior.This formulation in fact allows for under-dispersion,

as well as for overdispersion.then has a posterior mean of 1.73,a median of 1.59and 95%interval 0.96to 3.1,consistent with the analysis of McCullagh and Nelder.This 6

Let 0L denote the posterior average of =L and 7L the posterior average of the Poisson deviance component W (=L )=L log =L =L .Define 4L (0L L N ),where N is positive;then the Gelfand–Ghosh measure is 26L [7W (0L )]2(N 1)6L [{W (0L )L k}t(Q i )].7Starting values are provided by null values,values based on the posterior mean of a trial run,and values based on the upper 97.5%point of the trial run.CHOICE BETWEEN REGRESSION MODELS

y i $Poi(m i )

log (m i ) log (s i ) a b (t i ) g (c i ) d (o i )

D 2 X R (y i ,m i ) X [(y i àm i )2=V (y i )]

f D 3 D 1( f

) 2p e D 1 p e f f v vm i

v V (y i )

v àN\)(1N\}{1 = à = à

model enhances the coefficients for year of construction,but diminishes the size and ‘significance’of the coefficients for ship types B and C.The mean value of '1is slightly increased at 49,while the mean of '2is now 36.

Then in Model C of Program 3.3a general variance 8function (3.5)is assumed,in conjunction with a Poisson-gamma mixture.Following Winkelmann and Zimmerman (1991),this is achieved as

Three parallel chains are taken to 50000iterations,with different starting values for N and .An ((1)prior on N 1and a gamma prior,G (1,0.01),on

are adopted.Note that the latter prior amounts to ‘forcing’overdispersion.Conver-gence is relatively slow in W ,and hence ,and posterior summaries are based on iterations 15000–50000.

This approach leads to a posterior means for the statistics '1and '2of around 26and 19,respectively.Effective parameters are estimated as approximately 14,the devi-ance at the posterior mean being 11.9,so '311.92839.9.This improves over the simple Poisson regression.The deviance criterion under minimum predictive loss is reduced to 49.8.

There is clear evidence of overdispersion in the sense that averages 2.94,though in this model the departure from the Poisson assumption is explicitly modelled.The posterior median of the power parameter N is 0.59,though with a long tail of values that straddle the value of zero;this corresponds approximately to a square root transform in Equation (3.5).Figure 3.1shows how the posterior for this parameter

Smooth for power parameter,N 8Fitted in version 1.2of WINBUGS.

REGRESSION MODELS

y i $Poi(m i )

m i $G(Z 1i ,Z 2i )

Z 1i exp (b x i )1àk =(v à1)

Z 2i exp (b x i )àk =(v à1)

v t v à1

v % v à1.4 1.21

0.8

0.6

0.4

0.2

?0.2?1.5?1?0.5

00.51 1.5

d e n s i t y Figure 3.1

(based on iterations 20000–50000over three chains,with sub-sampling every tenth-iteration)focuses on negative values near the minimum threshold.

The credible intervals for the regression coefficients are little affected by this reformu-lation as compared to the Poisson regression,despite the allowance for overdispersion (Table 3.3).Note that Winkelmann and Zimmerman obtain a value for N of 0.74and for of 1.85.

Another possible modelling strategy,considered by Dey HW DO .(1997)and others,introduces a dependence of the variance for case L on predictors :L .These models may be substantively of interest,but may pose identifiability and interpretation prob-lems if,say,there are common predictors of both the mean and variance.Following Dey HW DO .,the variance is a function of the log of the months of observation (Model D in Program 3.3).This variable is also in the model for the mean,and usually taken to have coefficient of unity (this means it becomes an ‘offset’,in line with a Poisson process).

Here a gamma-Poisson mixture is again assumed with the inverse variances L of the gamma mixture linked to months of service.Thus,Model D involves the steps

This is consistent with a quadratic variance function,with

In fitting this model,N(0,1)priors are specified on D 0and D 1for numerical stability and to improve convergence.

Ship damage data models

Poisson (model A)

General variance Function (model C)Variance regression (model D)Mean

2.5%97.5%Mean 2.5%97.5%Mean 2.5%97.5%Regression parameters

Intercept

6.424 6.863 6.013 6.367 6.732 5.920 6.43

7.16 5.72Type B

0.5360.8760.1670.5740.9530.0910.53 1.140.10Type C

0.714 1.3980.0800.751 1.6010.0420.65 1.600.31Type D

0.0980.6830.4700.170 1.0050.5510.16 1.110.79Type E

0.3200.1430.8070.2520.3350.8440.390.43 1.28Built 1965–69

0.6990.4160.9910.6760.298 1.0470.690.24 1.16Built 1970–74

0.8200.499 1.1470.7870.380 1.2220.860.38 1.43Built 1975–79

0.4450.0240.9000.4300.1420.9660.490.19 1.20Period of obs’n 1975–79

0.3880.1500.6240.3870.1190.6830.390.010.76Other parameters

N

0.520.990.572.94 1.099.16D 0 1.250.27 2.35D 10.880.07 1.71

CHOICE BETWEEN REGRESSION MODELS

àv y i $Poi(m i )

m i n i d i

d i $G(w i ,w i )

log f i a 0 a 1log (months service)

log n i log (months service) effects of ship type,etc :

V (y i ) m i m 2i =w i

ààààààààààààààààààààààààààààààààààààààv w Table 3.3

The last 15000of a 20000iteration run with three chains show that there is a positive (negative)effect D 1of months service on the precision (variance).The most marked over-dispersion (as expressed also in the deviances at observation level in the simple Poisson regression above)occurs when damage incidents are relatively high in relation to exposure periods.Posterior means on coefficients in the model for the means are similar to in the baseline Poisson model,though with wider credible intervals.The average deviance '1and chi-squared sum '2are reduced to 25.5and 18,and all damage counts are well predicted.So in this example different model approaches with similar fit (in simple deviance terms)have effectively accounted for,or modelled,the over-dispersion but have differ-ent implications for the form of the variance function and the regression effects.A suggested exercise is to assess,(a)the effective parameters for the DIC,and (b)the deviance based criteria under minimum predictive loss,in Models C and D,and so attempt to discriminate between them.It should also be noted that the Poisson regres-sion analysis may be improved by allowing an interaction between ship type and year of construction (Lawless,1987).

While many goodness of fit criteria are ‘internal’ap-proaches comparing predictions with actual data,cross-validation methods involve out-of-sample predictions as a way of model assessment.Rust and Schmittlein (1985)consider the application of the split sample Bayesian Cross Validation (BCV)method to 40observations relating to marketing territory data for sales of click ball point pens (\,in $000s),advertising spots (;1),number of sales representatives (;2),and a wholesale efficiency index (;3).Two fold cross-validation is one option;other options are Q -fold cross-validation,or N -fold cross-validation (N small but exceeding two).

Two alternative sales prediction models are to be assessed via the split sample or two fold cross validation,namely linear and linear-in-logs forms:

As Rust and Schmittlein note,these are non-nested models,the relative performance of which may be difficult to assess with classical methods.Note that we take sales in thousands of dollars divided by 100provide a scaled outcome <.This means likelihoods will differ from those cited by Rust and Schmittlein.

Assuming Normal errors H ,we compare penalised likelihood criteria for model choice involving a model fitted to all observations with a BCV check based on a single random split of the 40observations into an estimation sample of 20and a validation sample of

20.This split is provided by Rust and Schmittlein,though Program 3.4includes code to derive alternative random split halves of 40observations.

The BCV approach yields results as in Table 3.4,based on a fit to the 20training cases and a log-likelihood comparing actual and predicted values of the outcome for the remaining 20validation cases.The latter suggests a small advantage to the linear model:a posterior model probability of 2.46)0.71,where

2.46exp (21.722.6).A suggested exercise is to:

(a)estimate and validate in the ‘reverse’direction,noting that there is no reason in

particular to regard one sub-sample as the training sample and one the validation sample;and

REGRESSION MODELS

Y a 0 a 1X 1 a 2X 2 a 3X 3 e 1

Y b 0 b 1log (X 1) b 2log (X 2) b 3log (X 3) e 2

2.46(1= à Example

3.4Sales territory

(b)split the data randomly into four sub-samples and estimate the two models four

times over cases in three sub-samples,validating each time over cases in the remaining sub-sample,and accumulating the predictive log-likelihoods.

To derive standard penalised fit measures,the log-likelihoods at the posterior means of the parameters (i.e.the regression parameters and error variance)are calcu-lated under the two models and penalised according to S 5parameters under the AIC or 5S 2log (40)9.22under the BIC.The respective log-likelihoods,following runs of 10000iterations under the linear and log-linear models,are 22.2and 25.7,so the linear model is also preferred on the basis of a full data set analysis.Since both models have the same number of parameters,the use of AIC or BIC makes no difference to this conclusion.Hence,there is unanimity between model choice methods in this example.It is apparent from Table 3.4that the inclusion of ;3is doubtful under either model,and the approach illustrated in Example 3.2might be used to compare reduced models.

Example 3.3considered simple and penalised deviance measures of fit for a Poisson count outcome,analogous to classical approaches.Here we consider the Laplace approximation as applied to model choice for Poisson regression,approximat-ing formal Bayes model choice based on marginal likelihood.In this approximation,the parameter dispersion matrix 6in Equation (3.6)is estimated using the second order differentials of the log-likelihood of the data.

Consider counts \L of inpatient hospitalisations of schizophrenic patients for 44small areas (electoral wards)in two London boroughs with expected cases denoted (L (see Congdon HW DO .,1998).These are to be ‘explained’in terms of measures of social deprivation and community structure.Thus,Model 1relates psychiatric morbidity to a single predictor,the Townsend deprivation index (;1),and Model 2combines the Townsend and anomie indices (;1and ;2,respectively).Univariate Normal priors are adopted for the regression coefficients,namely E M N(0,100).

Then \L Poi(P L ),where the Poisson means P L are related to predictors via a log-link:

Click ball points:BCV analysis Linear

Mean St.devn. 2.5%Median 97.5%log-likelihood

21.7 3.028.721.416.8A 0

0.180.98 1.810.19 2.10A 1

0.160.090.020.160.35A 2

0.350.260.160.350.87A 3

0.100.230.360.100.56Linear-in-logs

log-likelihood

22.6 3.129.822.217.6B 0

2.93 1.36 5.60 2.930.26B 1

1.870.970.01 1.86 3.75B 2

1.63 1.270.86 1.66 4.13B 30.060.55 1.040.06 1.16

CHOICE BETWEEN REGRESSION MODELS

=? àà$$log (m i ) log (E i ) b 0 b 1X 1i b 2X 2i ...

àààààààààààààààààààExample 3.5Laplace approximation to marginal likelihood:counts of schizophre-nia Table 3.4

Omitting constants,the log likelihood for area L is

and the second-order differentials of /with respect to E M and E N have the form

We fit Model 1and obtain posterior means and standard deviations E 03.1(0.35),E 11.27(0.14).The total Poisson log-likelihood (F[1]in program 3.5,Model B)is 137.3,while the marginal likelihood is approximated via the Laplace method 9is 147.2.

Adding the predictor anomie produces a gain in likelihood which would be judged significant by usual methods,with /now at 133.8.The posterior means and standard deviations are

The size of the Townsend score coefficient is much reduced and its precision lessened,and the marginal likelihood suggests in fact no gain in fit,standing at 147.5.So there is in effect nothing to choose between the models in Bayesian terms,with the (approxi-mate)Bayes factor close to unity.

Support for a simple model including ;1only is provided by applying the Kuo–Mallick dependent priors predictor selection method (Kuo and Mallick,1998,p.72).Here the model is extended to include a third index,the mental illness needs index or mini[]in Program 3.5.Priors on the three regression coefficients are set so that a value J M 0on the binary index for the M th coefficient results in an effectively zero coefficient.With three chains,convergence on the coefficients is apparent from around iteration 9000.The second half of a run of 40000iterations shows J 1to have value 1(;1included in every iteration),whereas J 2is 0.09and J 3is only 0.006.

Many outcomes relating to political or religious affiliation,labour force or social status,or choice (e.g.of travel mode to work)involve ordered or unordered polytomous variables (Amemiya,1981).The underlying model for such outcomes typically involves a latent continuous variable,which may be conceptualised as an attitudinal,prestige 9The BUGS coding for the Laplace approximation involves the inverse()and logdet()commands.For a Poisson outcome y[],covariates X[,1:p],posterior mean parameter estimate b[1:p]from an earlier fit involving N(0,100)priors,and expected counts ([],the coding for a non–iterative program could be:

for (i in 1:N){#Poisson mean

log(mu[i])<-log(E[i])sum(C[i,1:p])

#log-likelihood

L[i]<-mu[i]y[i]*log(mu[i])logfact(y[i])

for (j in 1:p){C[i,j]<-b[j]*X[i,j]

for (k in 1:p){D[i,j,k]<-X[i,j]*X[i,k]*mu[i]}}}

#inverse Hessian

for (j in 1:p){for (k in 1:p){H[j,k]<-sum(D[,j,k])

S[j,k]<-inverse(H[,],j,k)}}

#log probs of posterior mean values under Normal prior densities

for (j in 1:p){tau.b[j]<-0.01;mu.b[j]<-0;

Pr[j]<-0.5*log(tau.b[j]/6.28)0.5*tau.b[j]*pow(b[j]mu.b[j],2)}

#Laplace approxn

F <-0.5*p*log(6.28)0.5*logdet(S[,])sum(L[])sum(Pr[]) REGRESSION MODELS

L (b j y ) àm i y i log (m i )

àX ij X ik m i

à àààb 0 à3:24(0:33),b 1 0:81(0:21)and b 2 0:55(0:21)

à à ààààà 3.3POLYTOMOUS AND ORDINAL REGRESSION

相关主题