搜档网
当前位置:搜档网 › Fault detection, identification and diagnosis using CUSUM based PCA

Fault detection, identification and diagnosis using CUSUM based PCA

Fault detection,identi?cation and diagnosis using CUSUM based PCA

M.A.Bin Shams,H.M.Budman n ,T.A.Duever

Department of Chemical Engineering,University of Waterloo,Ontario,Canada N2L 3G1

a r t i c l e i n f o

Article history:

Received 26October 2010Received in revised form 1May 2011

Accepted 17May 2011

Available online 23June 2011Keywords:

Parameter identi?cation Process control Safety

Systems engineering

Fault detection and diagnosis PCA

a b s t r a c t

In this paper,a cumulative sum based statistical monitoring scheme is used to monitor a particular set of the Tennessee Eastman Process (TEP)faults that could not be properly detected or diagnosed with other fault detection and diagnosis methodologies previously reported.

T 2and Q statistics based on the cumulative sums of all available measurements were successful in observing these three faults.For the purpose of fault isolation,contribution plots were found to be inadequate when similar variable responses are associated with different faults.Fault historical data is then used in combination with the proposed CUSUM based PCA model to unambiguously characterize the different fault signatures.The proposed CUSUM based PCA was successful in detecting,identifying and diagnosing both individual as well as simultaneous occurrences of these faults.

&2011Elsevier Ltd.All rights reserved.

1.Introduction

Fault observability and distinguishability are desirable proper-ties for any detection and diagnosis system.Fault observability is relevant to the detection phase and it can be viewed as the ability of a detection system to detect,by using the available process measurements,abnormal process operation due to the occurrence of one or more faults.On the other hand,distinguishability is related to the ability of a monitoring system to diagnose or isolate a particular fault by using the available measurements especially when the system exhibits similar responses in these measure-ments for different faults (Qin,2003;Benjamin et al.,2008).Thus,observability refers to the ability of detecting abnormal operation whereas distinguishability refers to the ability to identify the particular fault or faults causing abnormal operation.Different methods have been proposed in the literature for fault detection and fault diagnosis (Venkatasubramanian et al.,2003).These methods can be broadly categorized into three main classes:(1)Analytical methods which are solely based on ?rst-principles models,e.g.observer based techniques.(2)Empirical methods,e.g.univariate and multivariate statistical methods.(3)Semi-empirical methods,which combine empirical models with prior knowledge about the system under consideration for example through the use of expert systems or fuzzy rules (Chiang et al.,2001;Bhushan and Romagnoli,2008).Each of these methods has its own advantages and disadvantages depending on the problem.

A number of researchers suggest combining these methods to improve detection.For examples,(Chiang and Braatz,2003;Lee et al.,2004)have observed that data driven analysis is enhanced by incorporating fundamental causal relationships among variables.Analytical methods require the use of ?rst-principle models that may be often complex and dif?cult to obtain and calibrate,thus making them less attractive for large scale systems.Therefore,this work focuses on the use of empirical methods for detection and isolation.Since data in chemical processes generally exhibit high correlation in time and cross-correlation among variables,multivariate statistical methods such as latent variable methods have been proposed for fault detection and diagnosis since they can deal effectively with these problems, e.g.Principal Components Analysis (MacGregor and Kourti,1995).

Fault isolation or fault diagnosis is used interchangeably in the literature.The difference is that when the classi?cation problem is resolved with the help of historical fault data,the correspond-ing procedure is referred to as a diagnosis;otherwise,it is referred to as isolation.The most widely used method for fault isolation is the contribution plot (Miller et al.,1998),which does not use historical fault data.It depicts the contribution of each process variable to the monitored statistics.That is,it identi?es those variables that are most correlated with the fault in question fault.For that reason the term fault identi?cation is sometimes used when fault isolation is accomplished using a contribution plot.Its effectiveness is limited to simple faults,e.g.sensor and actuator faults (Yoon and MacGregor,2001;Qin,2003).Researchers have used contribution plots combined with other methods to enhance the fault diagnosis procedure.For example,Dunia and

Contents lists available at ScienceDirect

journal homepage:https://www.sodocs.net/doc/041258409.html,/locate/ces

Chemical Engineering Science

0009-2509/$-see front matter &2011Elsevier Ltd.All rights reserved.doi:10.1016/j.ces.2011.05.028

n

Corresponding author:Tel.:t151********x36980;fax:t151********.E-mail address:hbudman@uwaterloo.ca (H.M.Budman).

Chemical Engineering Science 66(2011)4488–4498

Qin(1998)proposed a fault identi?cation index based on the fault reconstruction square prediction error(FRSPE).The smallest FRSPE is obtained for the reconstructed fault.Raich and Cinar (1997)proposed distance and angle metrics to diagnose process disturbances.Yoon and MacGregor(2001)proposed an angle based metric called the joint angle plot.Ku et al(1995)use different dynamic principal component analysis(DPCA)based models to characterize each fault.All these methods need historical fault data to precisely diagnose faults,e.g.fault vector directions.It will be shown in this work that the contribution plot is not suf?cient to accurately isolate faults in speci?c situations where the measured variables behave similarly during the occur-rence of different faults.

This paper proposes the application of Cumulative-Sum (CUSUM)based models in combination with PCA for the detection and diagnosis of faults in the Tennessee Eastman problem(TEP) (Downs and Vogel,1993).Location CUSUM(LCS)and Scale CUSUM(SCS)in combination with PCA based models were used to detect three particular faults that have been found unobser-vable by other algorithms previously applied to the TEP(Cheng et al.,2010;Chiang et al.,2003;Chiang and Braatz,2003;Ding et al.,2009;Ku et al.,1995;Zhang,2009).

In a previous study the authors have shown that an algorithm that combines CUSUM transformations with the Hotelling’s T2 statistics is able to detect abnormal operation following the occurrences of faults(Bin Shams et al.,2010).However,the previous study had two key limitations.First,the previously proposed methodology required a priori selection of a subset of measurements that were highly correlated to each particular fault.These variables were selected by using process knowledge. Second,the previous study does not consider the diagnosis problem associated with detected faults.

The current study expands the previous work by proposing a detection algorithm that uses all the measurements available at the plant thus bypassing the need to select a priori a set of particular measurements for detection.Then,the paper presents an algorithm based on the combination of PCA and CUSUM for detection,identi?cation and diagnosis.

This paper is organized as follows:de?nitions and an overview of the faults considered in the Tennessee Eastman Process(TEP) are given in Section2.Description of the proposed CUSUM based techniques for the detection and diagnosis are given in Section3. The results of applying the proposed CUSUM based strategy are discussed in Section4.Conclusions are given in Section5.

2.De?nition and method

2.1.Out-of-control average run length(ARL o.c)

Most monitoring data driven techniques are based on the statistical hypothesis-testing principle.Two types of errors occur when performing hypothesis testing namely type I and type II errors.A type I error occurs when a control chart indicates a fault in the absence of it,whereas a type II error occurs when a control chart fails to identify the occurrence of a fault(Montgomery, 1997).Observability of a fault is referred to as the ability to detect the fault from the chosen set of measurements.As proposed in a previous study,the out-of-control Average Run Length(ARL o.c)is used as a statistical measure to gage observability(Bin Shams et al,2010).The subscript(o.c)stands for out of control.The ARL o.c is de?ned as the average number of points that must be sampled or plotted before the chart signals the occurrence of a fault and it is a function of the probability of a type II error(b)occurring, that is

ARL o:c?febTe1T

For example,if in response to a certain fault,the estimated ARL o.c?1,the fault would be detected,on the average,after the ?rst sample following the onset of the fault.On the other hand,an ARL o.c?in?nity or a very large number implies that the fault is unobservable or it takes a long time to observe it.The value of the ARL o.c depends on the type of chart that is used for monitoring. Several analytical expressions are available for simple statistical charts(Montgomery,1997).For other type of charts,different approaches to estimate the ARL o.c based on the Markov chain approach have been proposed,e.g.(Brook and Evans,1972)but in practice,the ARL o.c is usually estimated from simulations con-ducted with random realizations of the disturbances(Woodall and Ncube,1985).The latter approach is adopted in the current study.

2.2.The cumulative sum(CUSUM)based control charts

A key disadvantage of Shewhart like control charts often used for detection is that they only use current time-interval informa-tion while not accounting for time history.Hence,those charts are relatively insensitive to small shifts in the process variables especially for small signal to noise ratio.These shortcomings motivate the use of other alternatives such as the univariate or the multivariate version of the CUSUM based charts.Four types of statistics are combined in the current study for either fault detection or fault isolation.Speci?cally,location cumulative sum (LCS),scale cumulative sum(SCS)and the PCA based statistical measures,T2and Q.The current study proposes the use of a combined version of these statistics as described in the following section.The LCS and SCS algorithms are examples of univariate statistics while the T2and the Q are multivariate statistics.Both the LCS and SCS are performed using the following two statistics, corresponding to a two-sided hypothesis test(Hawkins and Olwell,1998):

Ct

i

?max?0,Ct

ià1

tx iàem i:ctkT e2T

i

?max?0,Cà

ià1

tem i:càkTàx i e3T

Ct

?Cà

?0

where k,m i.c,C itand C iàare the slack variable,the in control mean, and the upper and the lower CUSUM statistics,respectively.The role of the slack variables is to introduce robustness with respect to noise.At every new sample,the statistics in Eqs.(2)and(3), account for the accumulated sums of small deviations.These summations are corrected using the slack variable and compared to zero using the(max)operation.When either one of the two statistical measures in Eqs.(2)and(3)exceeds a threshold H,the process is considered to be out of control.Following their respec-tive de?nitions,the LCS is especially effective for detecting changes in the average whereas the SCS is suitable for detecting changes in variability.Guidelines for the selection of k and H have been reported(Hawkins and Olwell,1998;Montgomery,1997).Typi-cally k is selected to be half of the expected shift in either m or s.H is determined so that a prespeci?ed ARL o.c is achieved.It should be noticed that when using Eqs.(2)and(3),the LCS uses the original raw data x i,whereas the SCS uses the following standardized quantity:

x i?

????????

9x

i

9

q

à0:822

0:349

e4T

A derivation of the quantities in Eq.(4)is given in Appendix A. Although LCS and SCS can be applied to individual measurements,

M.A.Bin Shams et al./Chemical Engineering Science66(2011)4488–44984489

there are many situations in which a single representative statistic for more than one variable is necessary.This is especially important when it is desired to present the operators with information in a compact form to simplify their monitoring activities during operation.In that case multivariate statistical measures can be used based on the univariate CUSUM’s as explained in the following section.

2.3.Statistical monitoring with principal component analysis(PCA)

For a process with n measurement variables,one alternative is to use n univariate control charts to monitor the process.As mentioned above,to simplify the presentation of information, a second alternative consists of using a principal component analysis(PCA)model to produce T2and Q charts for monitoring the n variables simultaneously.PCA involves the computation of loadings and scores using the covariance matrix of data X A R m?n; where n is the number of variables and m is the total number of samples.If the original variables are correlated,it is possible to summarize most of the variability present in the n variables space in terms of a lower p dimensional subspace(p5n)where p represents the number of the principal components that are retained to explain the majority of the variability in the data.If only two principal components are found to be signi?cant,two dimensional score plots are used(i.e.T1versus T2).For more than two principal components,Hotelling T2and Q statistics are usually used to monitor the process.The T2statistic,based on the?rst p PCs,is de?ned as

T2?

X p

i?1t2

i

l i

e5T

where l i is the i th eigenvalue of the covariance matrix of the original data matrix.Con?dence limits for T2at con?dence level (1àa)are related to the F-distribution as follows:

T2 m,p ?

emà1Tp

F p,màpe6T

where F p,màp is the upper100a%critical point of the F-distribution with p and(màp)degrees of freedom.Monitoring the process variables by the T2values based on the?rst p principal compo-nents is often not suf?cient since it only helps to detect whether or not the variation is within the plane de?ned by the?rst p principal components,which generally capture steady state correlations corresponding to normal operation of the process.If a new event,not present in the calibration data used to identify the reference model occurs,then additional principal components may become signi?cant and the new observation vector x i will move off the calibrated plane.Such new events can be detected by computing the squared prediction error or Q statistic.Let x i A R n denote the i th multivariate observation vector whose correspond-ing score is t i?x i.P.The prediction from the PCA model for x i is given by^x i?t i:P T?x i P:P T:Then,the p dimensional error vector is given by e i?x ià^x i and the corresponding Q is de?ned as follows: Q?e i e T

i

e7TAccordingly Q can be viewed as a measure of plant-model mismatch.The con?dence limits for Q are given by Jackson (1991).This test suggests the existence of an abnormal condition when Q4Q a,where Q a is de?ned as follows:

Q a?Y11tc a h0

??????????

2Y2

p

Y1

t

Y2h0eh0à1T

Y2

1

!1=h0

e8T

Y i?

X n

j?pt1l i

j

;for i?1,2,3e9T

h0?1à

2Y1Y3

3Y2

e10T

c a are the con?dence limits for the(1àa)percentile in a standard

normal distribution.These con?dence limits are calculated based

on the assumptions that that the measurements are time inde-

pendent and multivariate normally distributed.

Most of the statistical monitoring techniques available use

static PCA as its basic building block(Bakshi,1998;Wang and

Romagnoli,2005;Kramer,1991).

2.4.Tennessee Eastman process:previous detection and diagnosis

approaches

The Tennessee Eastman process has been widely used as a

benchmark problem to compare various monitoring solutions

(Cheng et al.,2010;Chiang et al.,2001;Ding et al.,2009;

Ku et al.,1995;Lee et al.,2004;Zhang and Zhang,2010).The

process is open loop unstable due to the process’s exothermic

reaction.Beside the reactor,the process has four main unit

operations,as shown in Fig.1:Condenser,compressor,separator

and stripper.The process produces two liquid products(G and H)

and one by-product(F)from four gaseous reactants(A,C,D,E)

and an inert(B).The original open loop FORTRAN code was

provided by Downs and Vogel(1993).The simulations of the

plant were done with the second decentralized control structure

proposed in Lyman and Georgakis(1995).

Different monitoring techniques have been tested and reported for

the TEP.These techniques have shown different capabilities in

detecting the majority of the20faults generally assumed for the

process.However,all of these previously reported techniques have

consistently failed in detecting the three particular faults described in

Table1.Almost,all of the methods previously applied to the TEP were

of a multivariate nature.Fig.2shows the results of the application of

static PCA for the TEP using the T2and Q statistics.The bounds of

normal operation corresponding to a95%and a99%con?dence levels

corresponding to the no fault situation,are shown as dotted lines

in

Fig. 1.Tennessee Eastman Process(TEP)with the second control structure

described in Lyman and Georgakis(1995).The circles indicates the location of

the three faults described in Table1.

Table1

The unobservable faults of the TEP process.

Fault no.Description Characteristic

IDV(3)D feed temperature Step change

IDV(9)D feed temperature Random variation

IDV(15)Condenser cooling water valve Valve stiction

M.A.Bin Shams et al./Chemical Engineering Science66(2011)4488–4498

4490

Fig.2and are calculated by Eqs.(6)and (8).The meaning of these bounds is that if either the T 2or Q is above these bounds after the occurrence of the fault,then the fault has been successfully detected.For the plots in Fig.2the corresponding faults were introduced at time ?160samples.However,as shown in Fig.2,both T 2and Q statistics fail to surpass the thresholds after the onset of either one of the 3faults in Table 1,i.e.IDV(3),IDV(9)and IDV(15).Hence,it can be concluded that these faults cannot be detected by static PCA.It should be noticed that when a PCA are used,p is replaced with a in Eq.(6),where a is the number of principal components retained in the PCA model.

Dynamic principal component analysis (DPCA)proposed by Ku et al.(1995)was used for the detection and diagnosis of the TEP faults.The DPCA has the advantage of taking into account information along several time intervals in contrast to the conventional static PCA,which is based solely on data collected at the current time.Accord-ingly,DPCA is more suitable for dynamic systems.However,the latter was also unsuccessful in observing these three faults.The results of the application of DPCA are not shown here for brevity.

The inability of previous techniques to detect the three faults given in Table 1motivates the use of cumulative sum measures.The Multivariate Cumulative Sum (MCUSUM)algorithm using all of the available TEP measurements was also initially tested (MacGregor and Kourti,1995;Woodall and Ncube,1985)but this technique was also unable to detect these three faults.The results are not shown for brevity.

The resulting lack of observability when using speci?c techni-ques is attributed to the statistically insigni?cant changes in the process mean or the process variance or combination of both exhibited by the system when these faults occur.

3.CUSUM based multivariate statistics 3.1.Fault detection using CUSUM based PCA

In a previous study,three faults of the TEP process,namely IDV(3):A step change in the D feed stream’s temperature,IDV(9):A random variation in the D feed stream’s temperature and IDV(15):A stiction in the condenser cooling water valve;have been observed using univariate CUSUM based Hotelling T 2(Bin Shams et al,2010).These faults were especially chosen since it was shown in other studies that they could not be detected by

other fault detection techniques applied to the TEP problem.Fig.3illustrates the dif?culty associated with detecting and diagnosing these three faults using the space of three measurements that are greatly affected by the presence of these three faults,namely,XMV(10):reactor cooling water ?ow,XMV(11):condenser cool-ing water ?ow,and XMEAS(21):reactor cooling water outlet temperature.As stated in the introduction,the previous study was based on a subset of measurements that were identi?ed,using process knowledge,to have signi?cant correlation with respect to each one of the faults.It can be seen from Fig.3that there is a signi?cant overlapping between the responses of these three variables both during the presence and absence of the fault,thus making it very dif?cult to both detect and isolate these three faults.The detection algorithm in the previous work was based on particular set of variables that had to be chosen a priori.Since there is no systematic way to identify these variables and since there may be an information loss when using a subset of the available measurement,all the available measurements are used in the current work for detection and diagnosis.As schematically described in Fig.4,two matrices are initially formed.One of them contains in its columns the LCS CUSUM’s and the other one contains the SCS CUSUM’s of all the measurements.For

example

Fig.3.The three variables which are greatly affected by the presence of the three faults of the TEP process.The strong overlapping between the three faulty states,i.e.IDV (3),IDV (9),IDV (15)and the normal condition IDV (0)makes the detection and diagnosis of these three faults a challenging

task.

Fig.4.The proposed CUSUM based statistics.The LCS and the SCS are performed on each sample vector.PCA is performed on the augmented matrix.The score and the residual spaces are monitored using the T 2and Q

statistics.

Fig.2.Monitoring the TEP faults using T 2statistic based on the PCA.Top:IDV (3),middle:IDV (9),bottom:IDV (15).

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–44984491

for the TEP problem,52measured variables are available.Thus each one of the two aforementioned matrices contains 52columns and a number of rows corresponding to the number of time intervals at which the measurements were collected.Then,a new matrix is formed by appending together the SCS and LCS matrices side by side resulting in a matrix with 104columns.To account for collinearity across the variables,the principal components are calculated.Princi-pal components that contribute signi?cantly to the total variability of the data are determined using parallel analysis (Chiang et al.,2001).The T 2is then used to monitor the space spanned by these com-ponents.In addition,the Q statistics is used to monitor the space spanned by those components not included in the calibrated model.Thus,any variation within the space de?ned by the PCA model is monitored using the T 2.On the other hand,lack of correlations as represented by the PCA model is detected using the Q statistic.Either or both statistics are used to detect the presence of any abnormality whenever their instantaneous values exceed the corresponding critical values at a given signi?cance level.

3.2.Fault isolation/diagnosis using CUSUM based PCA

The contribution plot has been the main tool used for fault isolation (Miller et al.,1998;Qin,2003).Two contribution plots are usually used to identify those variables affected by the presence of faults,namely,the contribution plots related to either T 2or Q statistics.For X A R m ?n ,the total contribution of variable j to the Q statistic at each sampling instant i is given by Cont ij ?e 2ij

e11T

On the other hand,the contribution of variable j to the T 2statistic for a 5n principal components at each sampling instance i is given by

Cont ij ?X a k ?1p 2jk k x 2ij t2p jk k X n r ?1r a j

p rk x ir 0B B B @1C C C A x ij t1k X n r ?1r a j

p rk x ir 0B B B @1C C C A 20B B B B @

1C C C C A e12T

where p jk and l k are the jk element of the loading matrix and the

k-th eigenvalue,respectively.To obtain the total contribution of the variables j within a speci?c time period,the corresponding Cont ij is summed over the required time window.As shown in Eq.(12)the contribution of the j variable to the T 2statistic

consists of three terms.The ?rst term includes solely variable j and the second term contains a cross-product between variable j and the rest of variables.The last term does not contain x ij ,and therefore does not affect the conclusion drawn about variable contribution.In practice,it is omitted.Although very simple to build,a fundamental drawback associated with the contribution plot is the lack of precision in isolating the correct fault as illustrated later in the case study.The main reasons is that contribution plots are based on a non-causal statistical correlation model that does not take into consideration the cause and effect correlations between the process variables.An alternative to the contribution plots proposed in the current study is the use of a set of models that are based on the CUSUM transformation combined with PCA based statistics as follows.Assume a model PCA j is designed to detect a particular fault f j .This model is trained using data generated when the fault f j occurs.The faulty data can be obtained from a historical data base or using a designed experi-ments.This data characterize the steady state correlations struc-ture between the process measurements when fault f j occurs.Then,if the critical limits determined for the T 2and Q statistics are exceeded this can be interpreted as a situation where fault f is not active.Fig.5depicts the proposed CUSUM based PCA strategy.The misdetection rate (MR%)is de?ned as the percentage of samples below the control limit after the fault occurrence,i.e.n b /n t ?100%;where n b and n t are the number of points below the threshold and the total number of samples following the fault onset,respectively.Accordingly,when low misdetection rate is associated with T 2or Q statistics,using the CUSUM-PCA model corresponding to fault f j ,this implies that the process is either operated under normal conditions or alternatively it experiences a fault different than that particular fault f j .Therefore,higher misdetection rate will indicate that the acquired measurements are in accordance with the model of fault f j ,that is,fault f j occurs.Assuming a total of n f faults,then a total of n f CUSUM based PCA models are required to isolate each one of these faults.In brief,if a model j is built for a particular fault f j ,misdetection or lack of detection using model j implies that the data does not exceeds the thresholds of model j ,therefore,indicating that fault f j has occurred.

4.Results and discussion

The sampling frequency for the CUSUM based chart is (1/180)Hz (3min time intervals).In all the following simulations,

the

Fig.5.The proposed CUSUM based diagnosis strategy.

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–4498

4492

faults are introduced after 160samples,that is,after 8h of a normal operation.Different noise realizations were tested and used to calculate the average run lengths (ARL o.c ).

For comparison reason,Fig.6depicts the normal condition.Figs.7–10depict the successful detection of IDV (3),IDV (9),IDV (15)and the simultaneous occurrence of IDV (3)and IDV (15)respectively,when the CUSUM of all the measurements are used as explained in Section 2.3.The vertical dashed line represents the onset of the fault whereas the horizontal dashed line represents the 99%con?dence limit.Both T 2as well as Q statistics are calculated,where the ?rst one serves to identify a departure from the variables normal condition values,the second serves

to

Fig. 6.The T 2and Q statistics based on the CUSUM based PCA for Normal Condition;horizontal and vertical lines represent the statistical limit and the fault onset,

respectively.

Fig.7.The T 2and Q statistics based on the CUSUM based PCA for IDV (3);horizontal and vertical lines represent the statistical limit and the fault onset,

respectively.

Fig.8.The T 2and Q statistics based on the CUSUM based PCA for IDV (9);horizontal and vertical lines represent the statistical limit and the fault onset,

respectively.Fig.9.The T 2and Q statistics based on the CUSUM based PCA for IDV (15);horizontal and vertical lines represent the statistical limit and the fault onset,respectively.

Table 2

The estimated ARL o.c for the CUSUM based T 2and Q statistics.

Fault no.

Statistics ARL o.c (hr)a IDV(3)T 2(Q )467.6(222.9)IDV(9)T 2(Q )143.8(127.9)

IDV(15)

T 2(Q )0.6(0.6)IDV(3)&IDV(15)

T 2

(Q )

0.6(0.6)

a

All ARL o.c are calculated after the onset of the faults (i.e.after 8

h).

Fig.10.The T 2and Q statistics based on the CUSUM based PCA for IDV (3)&IDV (15);horizontal and vertical lines represent the statistical limit and the fault onset,respectively.

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–44984493

indicate a departure from the steady state correlation.As can be seen from these ?gures,different faults affect the two monitored statistics differently.For example,it is clear in Fig.8that the fault is better detected by the Q statistics since T 2alone is not a suf?ciently accurate indicator to detect this fault.This illustrates the need to use both T 2and Q statistics to identify the presence of a fault with certainty.Table 2shows the ARL o.c associated with the detection of each one of the 3faults.As can be seen,there is a long delay associated with the detection of these faults,especially with IDV (3)and IDV (9).However,it can be argued that slow detection is preferable as compared to no detection at all.The ability of the proposed detection strategy to tackle all of the TEP faults has been tested as well.Table 3summarizes the performance results for all TEP faults in term of the ARL o.c .As can be seen,smaller ARL o.c can be achieved for the rest of TEP faults.

Once any of the three faults is detected,it is desired to isolate the occurred fault,i.e.to identify those variable most correlated with occurred faults.

Figs.11–13depict the CUSUM based T 2and Q contribution plots.Fig.11shows the signi?cant contribution of measurement 51or the reactor cooling water ?ow when IDV (3)occurs.Fig.12shows

that

Fig.11.Contribution plot for IDV

(3).

Fig.12.Contribution plot for IDV

(9).

Fig.13.Contribution plot for IDV

(15).

Fig.14.Contribution plot for IDV

(4).

Fig.15.The diagnosis results for IDV (3);First row:IDV (3).Second row:IDV (9).Third row:IDV (15).The higher misdetection rate at the ?rst row indicates the occurrence of IDV (3).

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–4498

4494

measurement 21or the reactor cooling water outlet temperature contributes signi?cantly to the Q statistic when IDV (9)occurs.Fig.13depicts the contribution of measurement 22or the separator cooling water outlet temperature in the presence of IDV (15).

Although the contribution plots emphasize those variables most related with the corresponding faults,there are situations where the contribution plots may be misleading.To demonstrate this situation,IDV (4):a step change in the reactor cooling water inlet temperature is used (Downs and Vogel,1993).

As can be seen from Figs.11and 14the CUSUM based contribu-tion plots are not actually helpful in isolating the root cause for their corresponding faults,i.e.IDV (3)and IDV (4).In fact a fault misclassi-?cation is inevitable;since both contribution plots,i.e.Figs.11and 14,choose the same variable i.e.measurements 51(reactor cooling water ?ow)as a possible cause for completely two different faults.

In addition,since contribution plots are based on non-causal correla-tion model,pinpointing the variables that are correlated with the occurred fault is all what contribution plots can provide.The determination of the type of the fault is still an ambiguous.

An enhanced fault diagnosis can be achieved by identifying PCA models based on the CUSUM based PCA as explained in Section 3.2.Four PCA models are identi?ed for the individual occurrence of faults IDV (3),IDV (9)and IDV (15)plus one corresponding to the scenario where IDV (3)and IDV (15)are occurring simultaneously.

Fig.15depicts the diagnosis results of IDV (3).The model is calibrated using the faulty data when IDV (3)occur.This ?gure is composed of a total of 6subplots.Each row of the subplots shows the T 2and Q responses corresponding to the occurrence of each one of the 3faults namely,IDV (3),IDV (9)and IDV (15),respectively.It can be seen that the ?rst row show that the measurements are in accordance with the calibrated model,i.e.the responses are within the limits of the model indicating the occurrence of IDV (3).On the other hand in the second and the third row,the T 2and Q critical limits of the models corresponding to IDV (9)and IDV (15),respectively,are exceeded.

From these plots,it can be concluded that IDV (3)is most likely the experienced fault,that is,a step change in the D feed temperature whereas the other two faults IDV(9)and IDV(15)are not https://www.sodocs.net/doc/041258409.html,paring to the contribution plots in distinguishing between IDV (3)and IDV (4),a better diagnosis performance is obtained when the proposed PCA models are used as illustrated in Fig.16.In the latter,the CUSUM based PCA model is calibrated using a data generated when IDV (3)is active.It can be seen from Fig.16that misdetection rates of 94.66%and 96.67%are obtained for the T 2and Q .That is,the corresponding thresholds are not exceeded for the CUSUM based PCA model trained with IDV (3)active,whereas these same thresholds are not exceeded 2.52%and 0.78%of the times for T 2and Q ,respectively when IDV (4)is active implying that most likely fault IDV(3)is the one active.

Fig.17shows data generated when IDV (15)occur.It can be seen in the third row of subplots that the critical limits of T 2and Q are exceeded indicating that IDV (15)is the active fault i.e.a stiction in the condenser cooling water valve.Fig.18illustrate the precise diagnosis of the simultaneous occurrence of IDV (3)and IDV (15)using the proposed approach.The necessity

of

Fig.17.The diagnosis results for IDV (15);First row:IDV (3),Second row:IDV (9),Third row:IDV (15).The higher misdetection rate at the third row indicates the occurrence of IDV

(15).

Fig.16.The fault historical data of IDV (3)clears the ambiguity associated with the contribution plots,Figs.10and 13.The ?rst row:IDV (3).The second row:IDV (4).

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–44984495

considering both the T 2and Q statistics in diagnosing these faults can be explained as follows.The ?rst column of subplots in Figs.17or 18would suggest that all of the faults are possible

fault candidates.In that case the Q statistic is used to assess,which fault is active.The need for using the Q statistics is further reinforced by the fact that not all the variation caused by the fault can be observed within the score space captured by the CUSUM based PCA model.

Tables 4–6summarize the diagnosis results for faults IDV (3),IDV (15)and for the simultaneous occurrence of both faults,respectively.Finally,the results using the proposed CUSUM based PCA is compared with the fault diagnosis results obtained when static PCA is applied directly to the data without CUSUM trans-formations of the measurements.

Typically it would be desirable that the decision column would have only one entry with value equal to one in the row corresponding to the active fault indicating a clear diagnosis of this fault.An entry of one in the decision column was assigned when the misdetection rate of the corresponding fault,i.e.the

Table 3

The estimated ARL o.c for all TEP faults using the CUSUM/PCA T 2statistic.Fault no.Description

Type

ARL o.c (hr)T 2IDV(1)A/C feed ratio B Composition constant (stream 4)Step 0.05IDV(2)B composition,A/C ratio constant (stream 4)Step 1.05IDV(3)D feed temperature (stream 2)

Step 467.60IDV(4)Reactor cooling water inlet temperature Step 13.7IDV(5)Condenser cooling water inlet temperature Step 0.10IDV(6)A feed loss (stream 1)

Step 0.5IDV(7)C header pressure loss -reduced availability (stream 4)Step

0.15IDV(8)A,B,C feed composition (stream 4)Random variation 0.05IDV(9)D feed temperature (stream 2)Random variation 143.8IDV(10)C feed temperature (stream 4)

Random variation 11.65IDV(11)Reactor cooling water inlet temperature Random variation 0.15IDV(12)Condenser cooling water inlet temperature Random variation 1.05IDV(13)Reaction kinetics

Slow drift 3.70IDV(14)Reactor cooling water valve

Valve stiction 0.25IDV(15)Condenser cooling water valve Valve stiction 0.60IDV(16)Unknown Unknown 49.75IDV(17)Unknown Unknown 0.10IDV(18)Unknown Unknown 2.60IDV(19)Unknown Unknown 0.15IDV(20)

Unknown

Unknown

4.90

Table 4

Misdetection rates;calibration model:IDV (3).Fault active Misdetection rate

(PCA)

Decision Misdetection rate

(CUSUM-PCA)

Decision

T 2(%)

Q (%)T 2(%)

Q (%)IDV(0)99.1199.17119.4911.500IDV(3)99.1299.20193.6596.331IDV(9)98.8999.01124.92 6.790IDV(15)19.3855.0100.390.080IDV(3&15)

18.50

55.38

0.39

0.08

0Fig.18.The diagnosis results for the simultaneous occurrence of IDV (3)&IDV (15);First row:IDV (3)&IDV (15),Second row:IDV (3),Third row:IDV (9).The higher misdetection rate at the third row indicates the simultaneous occurrence of IDV (3)&IDV (15).

M.A.Bin Shams et al./Chemical Engineering Science 66(2011)4488–4498

4496

response variables do not exceed the T2and Q thresholds is more than90%of the total number of samples following the onset of fault.This value,i.e.90%,is found reasonable since a much lower value was typically obtained when discrepancies occur.Any additional entry with value equal to one in a fault other than IDV(3)thus implies misclassi?cation of the fault under consid-eration.It can see from Tables4–6that when static PCA models for each of the three faults is used to characterize each fault,the occurring fault is always misclassi?ed as can be seen from the decision column,i.e.all entries with ones.For example,in Table4, when IDV(3)has occurred,the diagnosis scheme based solely on static PCA indicates the presence of all of the faults,which is,in fact,a wrong decision.The application of the proposed CUSUM based statistics(T2and Q)shows a much superior capability compared to PCA as indicated by the single entry in the decision column.Similar results are obtained for IDV(15)as well as for the simultaneous occurrence of both faults as can be seen in Table6.Although the proposed CUSUM based PCA algorithm shows ability in detecting and diagnosing the TEP faults,it is worth to highlight some of its shortcomings.From detection perspective,a main disadvantage of the proposed method is the increase in the associated number of parameters to be estimated.These include the statistical thresholds for the multivariate statistics,i.e.T2and Q statistics,as well as the LCS and SCS parameters i.e.K and H.Similar to any other detection algorithm,there is an uncertainty associated with estimating those parameters.On the other hand,from diagnosis perspective,a main disadvantage of the proposed method is the need of faults historical data to train each faulty model.Although it is dif?cult to be obtained, fault historical data was found necessary to reveal the ambiguity resulted using the contribution plots for diagnosis purposes.

5.Conclusion

A new approach based on the combination of univariate CUSUMs and multivariate PCA statistics is proposed for the fault detection and diagnosis problem.The approach has been demonstrated using a subset of the Tennessee Eastman Process faults that have been typically found unobservable or undistin-guishable when using other detection or diagnosis techniques. Following successful detection of the faults,the variables related to the faults are identi?ed.In addition,due to the overlapping nature of the measured variables that are strongly correlated to these three faults,the contribution plots were found inadequate in some instances to precisely locate the root cause of the faults. Instead,the use of a family of PCA models trained with CUSUM transformations of all the available measurements collected during individual or simultaneous occurrence of the faults were found effective in correctly diagnosing these faults.

Appendix A.Scale CUSUM(SCS)parameters

The parameters of the Scale CUSUM(SCS)have been derived as follows.Let,x i$N(0,s2),i?1,2,y,n and y i?9x i/s9l.The char-acteristic of the distribution of y i are easily worked out from the standard normal distribution.That is

Pr?y i o c ?2Fec1=lTà1eA1Twhere^()denote the standard normal function.Furthermore, the k th moment of y i is as follows:

E?y k

i

?m?20:5:l:k:G?0:5el:kt1T =

????p

p

eA2T

where G()is the gamma function.With l?0.5,the transformed variate y i has a distribution which is very close to normal.In particular,using(A2),the?rst and second moments are given as followings:

E?y k

i

?E?y1

i

?m?20:25Ge3=4T=

????p

p

?0:82218eA3TVey iT?

????????????

e2=pT

p

Ge1Tàm2?e0:34915T2eA4TReferences

Bakshi, B.R.,1998.Multiscale PCA with application to multivariate statistical process monitoring.AIChE Journal44,1596–1610.

Benjamin,J.O.,De La Pena,D.M.,Davis,J.F.,Christo?des,P.D.,2008.Enhancing data based fault isolation through nonlinear control.AIChE Journal54,223–241. Bhushan, B.,Romagnoli,J.A.,2008.Self-organizing self-clustering network:a strategy for unsupervised pattern classi?cation with its application to fault diagnosis.Industrial and Engineering Chemistry Research47,4209–4219. Bin Shams,M.,Budman,H.,Duever,T.,2010.Fault detection using CUSUM based techniques with application to the Tennessee Eastman Process.In:Proceeding of the9th International Symposium on Dynamic and Control of Process systems(DYCOPS),Leuven,Belgium.

Brook,D.,Evans,D.A.,1972.An approach to the probability distribution of CUSUM Run length.Biometrika59,539–549.

Cheng, C.Y.,Hsu, C.C.,Chen,M.C.,2010.Adaptive kernel principal component analysis(KPCA)for monitoring small disturbances of nonlinear processes.

Industrial and Engineering Chemistry Research49,2254–2262.

Chiang,L.H.,Russel, E.L.,Braatz,R.D.,2001.Fault Detection and Diagnosis in Industrial Systems.Springer,London.

Chiang,L.H.,Braatz,R.,2003.Process monitoring using causal map and multi-variate statistics:fault detection and identi?cation.Chemometrics and Intel-ligent laboratory Systems65,159–178.

Downs,J.J.,Vogel, E.F.,1993.A plantwide industrial process control problem.

Computer and Chemical Engineering17,245–255.

Ding,S.X.,Zhang,P.,Naik,A.,Ding,E.L.,Huang,B.,2009.Subspace method aided-driven design of fault detection and isolation system.Journal of Process Control19,1496–1510.

Dunia,R.,Qin,J.,1998.Joint diagnosis of process and sensor faults using principal component analysis.Control Engineering Practice6,457–469.

Hawkins, D.M.,Olwell, D.H.,1998.Cu-mulative Sum Charts and Charting for Quality.Springer-Verlag,New York.

Jackson,J.,1991.A User Guide to Principal Components.Wiley,New York. Kramer,M.A.,1991.Nonlinear principal component analysis using autoassociative neural networks.AIChE Journal37,233–343.

Ku,W.,Storer,R.H.,Georgakis,C.,1995.Disturbance detection and isolation by dynamic principal component analysis.Chemometrics and https://www.sodocs.net/doc/041258409.html,boratory Systems30,179–196.

Table5

Misdetection rates;calibration model:IDV(15).

Fault active Misdetection rate

(PCA)Decision Misdetection rate

(CUSUM-PCA)

Decision

T2(%)Q(%)T2(%)Q(%)

IDV(0)99.9499.67186.0013.800 IDV(3)99.9599.68176.8112.820 IDV(9)99.9099.58185.9513.550 IDV(15)99.0099.08198.6099.651 IDV(3&15)99.1799.07190.0620.860

Table6

Misdetection rates;calibration model:IDV(3&15).

Fault Active Misdetection rate

(PCA)Decision Misdetection rate

(CUSUM-PCA)

Decision

T2(%)Q(%)T(%)2Q(%)

IDV(0)99.9099.62189.8114.060

IDV(3)99.9099.60189.8813.510

IDV(9)99.8999.53189.7413.960

IDV(15)98.7298.96193.8232.180

IDV(3&15)99.0599.10198.4998.251

M.A.Bin Shams et al./Chemical Engineering Science66(2011)4488–44984497

Lee,J.M.,Yoo,C.,Lee,I.,2004.Statistical monitoring of dynamic processes based on dynamic independent component analysis.Chemical Engineering Science59, 2995–3006.

Lyman,P.R.,Georgakis, C.,1995.Plantwide control of the Tennessee Eastman https://www.sodocs.net/doc/041258409.html,puter and Chemical Engineering19,321–331.

MacGregor,J.F.,Kourti,T.,1995.Statistical process control of multivariate processes.Control Engineering Practice3,403–414.

Miller,P.,Swanson,R.E.,Heckler,C.F.,1998.Contribution plots:the missing link in multivariate quality control.Applied mathematics and computer science8, 775–792.

Montgomery,D.C.,1997.Introduction to Statistical Quality Control.Wiley,New York. Qin,S.J.,2003.Statistical process monitoring:basics and beyond.Journal of Chemometrics54,480–502.

Raich,A.,Cinar,A.,1997.Diagnosis of process disturbances by statistical distance and angle https://www.sodocs.net/doc/041258409.html,puter and Chemical Engineering21,661–673.Venkatasubramanian,V.,Rengaswamy,R.,Kavuri,N.,Yin,K.,2003.A review of process fault detection and diagnosis part III:process history based method.

Computer and Chemical Engineering27,327–346.

Wang,D.,Romagnoli,J.A.,2005.Robust multi-scale principal components analysis with applications to process monitoring.Journal of Process Control15, 869–882.

Woodall,W.,Ncube,M.,1985.Multivariate CUSUM quality control procedures.

Technometrics27,285–292.

Yoon,S.,MacGregor,J.,2001.Fault diagnosis with multivariate statistical models part I: using steady state fault signatures.Journal of Process Control11,387–400. Zhang,Y.,2009.Enhanced statistical analysis of nonlinear process using KPCA, KICA and SVM.Chemical Engineering Science64,801–811.

Zhang,Y.,Zhang,Y.,2010.Fault detection of non-Gaussian processes based on modi?ed independent component analysis.Chemical Engineering Science65, 4630–4639.

M.A.Bin Shams et al./Chemical Engineering Science66(2011)4488–4498 4498

相关主题