搜档网
当前位置:搜档网 › DAISY_A database for identification of systems

DAISY_A database for identification of systems

DAISY_A database for identification of systems
DAISY_A database for identification of systems

K.U.Leuven

Department of Electrical Engineering(ESAT)SISTA

Technical report97-70

DAISY:A database for identi?cation

of systems?

B.De Moor,P.De Gersem,B.De Schutter,and W.Favoreel

If you want to cite this report,please use the following reference instead:

B.De Moor,P.De Gersem,B.De Schutter,and W.Favoreel,“DAISY:A

database for identi?cation of systems,”Journal A,vol.38,no.3,pp.4–5,

Sept.1997.

ESAT-SISTA

K.U.Leuven

Leuven,Belgium

phone:+32-16-32.17.09(secretary)

fax:+32-16-32.19.70

URL:http://www.esat.kuleuven.ac.be/sista-cosic-docarch

DAISY:A Database for Identi?cation of Systems?

Bart De Moor?Peter De Gersem?Bart De Schutter§Wouter Favoreel?ESAT/SISTA,Kardinaal Mercierlaan94,B-3001Leuven,Belgium,

tel.:(+32)-(0)16-32.17.09,fax:(+32)-(0)16-32.19.70

{bart.demoor,peter.degersem,bart.deschutter,wouter.favoreel}@esat.kuleuven.ac.be web:http://www.esat.kuleuven.ac.be/sista

Abstract

We point out the existence of a disturbing de?ciency in the?eld of system identi?ca-

tion,namely the fact that many results,published in papers,are not reproducible.In

many cases,datasets and time series,that are used to illustrate identi?cation methods

and algorithms in these publications,are not freely available.We propose to remedy this

serious de?ciency by setting up a publically accessible website,called DAISY,to which

authors can submit datasets that are used to illustrate certain claims and algorithms in

their papers.Several additional bene?ts are discussed as well.

Keywords:System identi?cation,signal processing,time series analysis,data analysis,

modeling,datasets.

1To measure is to know...

Reproducibility is one of the most basic characteristics of scienti?c research.Yet,in the

?elds of data analysis,system identi?cation and signal processing,this very aspect is often neglected or even completely ignored.By this we mean the following:Too often,papers under

review or papers published in conference proceedings and journals,contain an illustration of

a certain algorithm,applied to a given data set.A typical statement is then that’such and

such method works well on such and such dataset’.The problem is that this speci?c dataset is

almost always unavailable and inaccessible.The critical reader is confronted with the paradox

that the theoretical derivation of the algorithm in the paper seems to be right but that the

veri?cation of its behavior,when applied to the real dataset,is merely impossible.Therefore,

the value of such an illustration of a method when applied to a real dataset,is scienti?cally ?Work supported by the Flemish Government(BOF(GOA-MIPS),AWI(Bil.Int.Coll.),FWO(projects,

grants,https://www.sodocs.net/doc/9a12418941.html,m.(ICCoS)),IWT(IWT-VCST(CVT),ITA(ISIS),EUREKA(Sinopsys))),the Belgian

Federal Government(IUAP IV-02,IUAP IMechS),the European Commission(HCM(Simonet),TMR(Ala-

pedes),ACTS(Aspect),SCIENCE(ERNSI)),NATO(CRG)and industry(Electrabel).For more details on

these projects,see web-coordinates.

?Research Associate with the FWO(Fund for Scienti?c Research-Flanders),Associate Professor

K.U.Leuven;

?Research Assistant with the FWO;

§Senior Research Assistant with the FWO;

?Research Assistant supported by the IWT.

void and may be only aesthetic.However,in many cases,the author of the paper had to go through a lot of trouble to obtain the given dataset.Everyone who has been active in experimental work,knows how di?cult and time-consuming it is to set up an experiment, obtain measurements,decide on?lters,sampling frequencies,sensors,data-acquisition and logging,etc....When all of this is done,there remains the confrontation of reality with the theoretical framework,which is always based on assumptions and hypotheses,that never seem to be satis?ed in practice.The central challenge in system identi?cation and signal processing is precisely this confrontation between experimentally obtained measurements and mathematically derived algorithms.Yet,what we see in most papers is an emphasis on the mathematics and the algorithmic derivations,for which(at least for good papers),all necessary details are provided,so that the algorithm can be understood and reproduced without much di?culty.When it comes to the data or time series to which these algorithms are applied,only nice pictures or generic statistics on the performance of the algorithm are provided,which are barely reproducable.

2Turn an art into science...

Of course,this lack of reproducibility basically originates in practical considerations,as one could not expect that papers would contain the complete dataset,especially when it is huge.

A scienti?cally acceptable solution would be to make datasets publically available on?oppys or CD-roms.Of course,while rather expensive,this solution would also have its practical limitations of compatibility of data formats between di?erent measurement and computing environments.

It goes without saying that the World Wide Web can contribute signi?cantly to solve the reproducibility problem hinted at in the introductory section.We propose to construct a web-site,which we have called DAISY,which stands for Da tabase for I denti?cation of Sy stems. The key idea is that authors,having published a paper on system identi?cation or signal processing,submit the dataset that was used as an illustration,to DAISY,hence making it publically available1.

The best way to get acquainted with DAISY is to consult it at its World Wide Web URL: http://www.esat.kuleuven.ac.be/sista/daisy

The central objects in DAISY are datasets,which,once submitted,undergo a(moderate) review procedure(to?lter out’impossible’or low quality datasets)and,when accepted,are publically available on the Web2.Datasets are grouped according to data categories,which at the time of writing consist of process industry systems(e.g.ethane-ethylene destillation column,glass furnace,...),electrical systems,mechanical systems(e.g.wing?utter data, CD-player arm data,...),biomedical systems(e.g.Fetal ECG measurements,...),biochem-ical systems,econometric data,environmental systems,’classical’datasets,thermal datasets.

1The issue of reproducibility requires that we agree on how to refer to DAISY in papers that will use some of its datasets.We propose the following reference:De Moor B.(ed.).DAISY:Database for the Identi?cation of Systems,Dept.of Electrical Engineering,ESAT/SISTA,K.U.Leuven,Belgium, URL:http://www.esat.kuleuven.ac.be/sista/daisy,+date of visit,name of dataset,name of section and code number.

2We take it for granted that all submitted datasets have been cleared from any con?dentiality agreement between the owner of the system on which the data were obtained and the person and/or organization that submitted the dataset to DAISY.

There is an automatic submission procedure in which some characteristic parameters of the dataset need to be described(sampling frequency,number of data,number of inputs and outputs and their units,references,etc...)(see the website for details).Also available are an extended bibliography of more than100books on system identi?cation and signal process-ing,a survey with World Wide Web hyperlinks to existing software packages and existing databases of datasets on the Web.

3Making it work:L’app′e tit vient en mangeant!

While providing a basic solution to the problem of reproducibility in system identi?cation, we achieve other bene?ts as well:Some of the datasets in DAISY will evolve in due time into real benchmarks3,that will facilitate a comparison of the performance of algorithms. More generally,DAISY can become instrumental in establishing comparisons of concepts, methods and algorithms.One and the same dataset could be used to assess the quality of several variations of the same method,or,more generally,to compare the performance of di?erent methods derived in di?erent frameworks(e.g.’classical’system identi?cation, prediction error methods,subspace methods,maximum likelihood methods,structured to-tal least squares,time versus frequency domain approaches,linear versus nonlinear,neural, fuzzy,etc...)or di?erent software environments(like Matlab,Xmath,etc...).DAISY will also stimulate collaboration and interaction between researchers,organizations and compa-nies active in system identi?cation.In particular,such a collaboration might enhance the cost-e?ectiveness of experiments,since measurement set-ups will not have to be repeated. DAISY will also be instrumental in providing inspiration to people in industry,when they see how certain datasets are reminiscent to the application they have in mind.And why not use DAISY as a didactical tool,by inviting students to apply to real datasets,in their home-works,the methods taught in system identi?cation https://www.sodocs.net/doc/9a12418941.html,st but not least,a dataset submitted to DAISY,will,on the average,be longer available in time than it would be with the author,who might have decided to move to another address or started a career in another domain than system identi?cation.While experimental set-ups cease to exist at a certain moment in time,the datasets that were obtained from it,will remain available under DAISY.

4Conclusions

We have been describing how an important de?ciency in the?eld of system identi?cation, namely the reproducibility of datasets and time series,can be cured via DAISY,a Database for Identi?cation of Systems.We would like to invite all researchers active in system iden-ti?cation,data and time series analysis to submit datasets and provide us with feedback, suggestions for improvement etc....DAISY can be consulted at

http://www.esat.kuleuven.ac.be/sista/daisy

3All accesses to datasets in DAISY are logged.A dataset will evolve into a benchmark when it becomes a leader in the hitting statistics.These can be consulted in DAISY by just hitting a push button.

相关主题