搜档网
当前位置:搜档网 › Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measure

Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measure

Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measure
Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measure

Boston University Computer Science Tech.Report No.2005-009,April06,2005.

To appear in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR),June2005.

Ef?cient Nearest Neighbor Classi?cation Using a Cascade of Approximate

Similarity Measures

Vassilis Athitsos,Jonathan Alon,and Stan Sclaroff

Computer Science Department

Boston University

111Cummington Street

Boston,MA02215

email:{athitsos,jalon,sclaroff}@https://www.sodocs.net/doc/1e8158900.html,

Abstract

This paper proposes a method for ef?cient nearest neigh-

bor classi?cation in non-Euclidean spaces with computa-

tionally expensive similarity/distance measures.Ef?cient

approximations of such measures are obtained using the

BoostMap algorithm,which produces embeddings into a

real vector space.A modi?cation to the BoostMap algo-

rithm is proposed,which uses an optimization cost that is

more appropriate when our goal is classi?cation accuracy

as opposed to nearest neighbor retrieval https://www.sodocs.net/doc/1e8158900.html,-

ing the modi?ed algorithm,multiple approximate nearest

neighbor classi?ers are obtained,that provide a wide range

of trade-offs between accuracy and ef?ciency.The approxi-

mations are automatically combined to form a cascade clas-

si?er,which applies the slower and more accurate approx-

imations only to the hardest cases.The proposed method

is experimentally evaluated in the domain of handwritten

digit recognition using shape context matching.Results on

the MNIST database indicate that a speed-up of two to three

orders of magnitude is gained over brute force search,with

minimal losses in classi?cation accuracy.

1.Introduction

Nearest neighbor classi?ers are appealing because of their

simplicity and their ability to model a wide range of com-

plex,non-parametric distributions.However,?nding the

nearest neighbors of an object in a large database can take

too long,especially in domains that employ computation-

ally expensive distance measures.

This problem is illustrated in[3],where a computation-

ally expensive image similarity measure,called shape con-

text matching,is used for optical character recognition.A

three-nearest-neighborclassi?er using shape context match-

ing and20,000training objects yields an error rate of only

Figure1:Some examples from the MNIST database of images of handwritten digits.

In our experiments with handwritten digit recognition using the MNIST database[15],we obtain speed-ups of two to three orders of magnitude compared to exact shape con-text matching,with minimal loss in classi?cation accuracy (about0.1%or less).Our method also yields signi?cant speed-ups compared to using BoostMap,a state-of-the-art method for approximating a similarity measure.

2.Related Work

Typically,the problem of ef?cient nearest neighbor classi?-cation is addressed by using indexing methods,that can?nd the nearest neighbor or neighbors of the test object(also called query object)without having to exhaustively com-pare the test object with every single training object.Sev-eral hashing and tree structures[4,11,23,24]have been proposed for indexing.However,the performance of such methods degrades in high dimensions.Locality sensitive hashing(LSH)[14]is a method for approximate nearest neighbor retrieval that scales better with the number of di-mensions,but LSH is only applicable for speci?c metrics, and cannot be applied to arbitrary distance measures.

In domains where the distance measure is computa-tionally expensive,embedding methods can been used for speeding up nearest neighbor retrieval.Such methods in-clude Lipschitz embeddings[12],FastMap[7],MetricMap [22],SparseMap[13],and BoostMap[1].Embedding methods de?ne functions that map every object into a vector of real numbers.Given a query object,instead of computing its exact distance to each training object,we simply map the query to a vector and compare that vector with the vectors corresponding to the training objects.Typically vectors are compared using a Euclidean(L2)or L p distance metric,that can be several orders of magnitude faster to evaluate com-pared to the exact distance measure in the original space.

The goal of existing embedding methods is ef?cient and accurate nearest neighbor retrieval.The main difference of our method is that it aims at ef?cient and accurate clas-si?cation.Our method builds on top of existing embed-ding methods,and shows how to improve classi?cation ef-?ciency by constructing a cascade of approximate nearest neighbor classi?ers.

The ef?ciency of nearest neighbor classi?ers can also be improved using condensing methods[6,8,10].Those methods try to identify and remove training objects whose removal does not negatively affect classi?cation accuracy. In the experiments we compare our method to the condens-ing method described in[10].

Shape context matching[3]is a computationally expen-sive non-Euclidean similarity measure.It is based on the shape context feature,a rich local feature that describes the distribution of points around a given location.Shape con-text has been successfully applied in various recognition do-mains,like trademark recognition[3],hand shape detection and classi?cation[19]and body pose estimation[9,18].

Methods that improve the ef?ciency of shape context matching can be useful in a wide range of domains.In [17],ef?cient retrieval is attained by pruning based on com-parisons of a small subset of shape context features,and also using vector quantization on the space of those fea-tures.In[9]shape context features are matched based on the Earth Mover’s Distance(EMD).The EMD is ef?ciently approximated using an embedding,and then Locality Sen-sitive Hashing is applied on top of the embedding.In[25] a discriminative classi?er is learned based on correspon-dences of shape context features between the test object and a small number of prototypes per class.In experiments on the MNIST database,[25]reports an error rate of2.55%by comparing the test object to only50prototypes.In compar-ison,the original brute force method in[3]is much slower (20,000distance computations per test object),but achieves a signi?cantly lower error rate(0.63%).

Our algorithm constructs a cascade of nearest neighbor classi?ers.Cascades of classi?ers have been very popular in recent years[16,19,21].However,typically cascades are applied to a binary“detection”problem,where the goal is to determine whether a given image window contains an instance of some class.Our method produces a cascade of classi?ers for a multi-class problem,using approximations of the original nearest neighbor classi?er as building blocks.

3.Tuning BoostMap for Classi?cation

Accuracy

BoostMap[1]is a method for constructing embeddings that are optimized for preserving the similarity structure of the original space.In this section we introduce a modi?ed ver-sion of the algorithm,that is more appropriate when our goal is nearest neighbor classi?cation accuracy.We will ?rst give a very basic description of BoostMap(the reader is referred to[1]for the details).Then we will motivate and describe our modi?cation to the algorithm.

3.1.Embeddings and BoostMap

We use X to denote a set of objects,and D X(x1,x2)to de-note a distance measure between objects x1,x2∈X.In our example application,X is a set of images of handwrit-ten digits(Fig.1),and D X is shape context matching as de?ned in[3].However,any X and D X can be plugged in the formulations described in this paper.

An embedding F:X→R d is a function that maps any object x∈X into a d-dimensional vector F(x)∈R d. Distances in R d are measured using some L p metric,most commonly the Euclidean(L2)or Manhattan(L1)metric. It is assumed that measuring a single L p distance between two vectors is signi?cantly faster than measuring a single distance D X between two objects of X.This assumption is obeyed in our example application:on our PC,over a million L1distances between high-dimensional vectors in R100can be measured in one second,whereas only15shape context distances can be evaluated per second.

In BoostMap[1],the building blocks used to construct an embedding F are simple,one-dimensional(1D)embed-dings.Any object r∈X can be used to de?ne a one-dimensional embedding F r:X→R,as follows:

F r(x)=D X(x,r).(1) In plain terms,F r maps each object of X to a single real number.Furthermore,if x1is very similar to x2under D X,then in many domains we can expect(and in metric spaces we can guarantee)that the distances D X(x1,r)and D X(x2,r)will be similar,meaning that F r will map x1and x2to nearby points on the real line[1].

Each1D embedding F r acts as a classi?er for the fol-lowing binary classi?cation problem:given three objects q,a,b∈X,is q closer to a or to b?F r provides an answer by simply checking if F r(q)is closer to F r(a)or to F r(b). Since F r is a simple,1D embedding,it is expected to act as a weak classi?er[1,20],i.e.,it will probably have a high error rate,but still it is expected to be more accurate than a random guess,which would have an error rate of50%.

The BoostMap algorithm uses a training set S of triples (q,a,b),picked randomly from the available training ob-jects.The algorithm constructs,using AdaBoost[20],an embedding F:X→R d optimized for classi?cation accu-racy on triples of objects.In BoostMap,distances in R d are measured using a weighted L1metric.

3.2.The Modi?ed BoostMap Algorithm

The original BoostMap algorithm aims at preserving sim-ilarity rankings,and minimizes the fraction of triples (q,a,b)where q is closer to a than to b,but F(q)is closer to F(b)than to F(a).If our end goal is classi?cation accuracy, then for some triples of objects the minimization criterion of BoostMap is either irrelevant or problematic.For example,suppose that q is an image of the digit“2”,and a and b are images,respectively,of digits“0”and“1”.In that case, for the purposes of classi?cation,it is irrelevant whether the embedding maps q closer to a or to b.

A more interesting example is the following:suppose q and b are both images of the digit“2”,and a is an image of the digit“0”.Furthermore,suppose that,a is the nearest neighbor of q among training objects according to shape context matching.In that case,we would actually prefer an embedding that maps q closer to b than to a,although the original BoostMap algorithm would penalize for that.

To address these problems,we propose the following guidelines for selecting training triples and measuring the optimization cost:

?Useful training triples are triples(q,a,b)such that a is one of the nearest neighbors of q among objects of the same class as q,and b is one of the nearest neighbors of q among objects of some other class.

?The optimization cost that should be minimized is the number of training triples(q,a,b)such that the output embedding F maps q closer to b than to a,regardless of whether q is actually closer to b than to a in terms of distance measure D X.Since a is of the same class as q,we want q to be mapped closer to a than to b.

In our example application,we apply these guidelines as follows:we choose a subset X′of5,000training objects, and for every object q∈X′,we form20triples(q,a,b)(for a total of100,000training triples),such that a is the nearest neighbor of q among all objects in X′of the same class as q, and b is one of the?ve nearest neighbors of q among objects in X′that belong to some class different than that of q.In other words,in order to choose b given q,we perform the following three steps:

1.Choose(randomly)a class C different than that of q.

2.Find the?ve nearest neighbors of q among objects of

class C in X′.

3.Set b randomly to one of those?ve objects.

Instead of?nding the?ve nearest neighbors,we experi-mented with different values,ranging from two to20,with-out observing much difference in the results.

Overall,our implementation of the BoostMap algorithm is exactly as described in[1],except for the way we choose training triples,the optimization cost that we use,and the fact that we use only1D embeddings of the type de?ned in Eq.1.

4.Cascading Approximate Classi?ers Suppose that we have used some embedding method,like BoostMap,and we have obtained an embedding F.In

this section we will use the?lter-and-re?ne retrieval frame-work[12]to de?ne,based on F,multiple approximations of the nearest neighbor classi?er corresponding to the distance measure D X.These approximations cover a wide range of tradeoffs between accuracy and ef?ciency.We will also show how to automatically combine such approximations into a cascade classi?er,that can approximate the accuracy of the slowest approximation,while being much more ef?-cient than the slowest approximation.

4.1.Filter-and-re?ne Classi?cation

We can use embedding F for nearest neighbor classi?cation in a?lter-and-re?ne framework[12].In particular,given parameters p and k,and given a query object q,we perform the following steps:

?Embedding step:compute the embedding F(q)of the query.If F is a d-dimensional embedding obtained with BoostMap,then F(q)=(F r1(q),...,F r d(q)), where each F r i is de?ned using some object r i∈X, according to https://www.sodocs.net/doc/1e8158900.html,puting F(q)consists of com-puting d distances D X between q and objects r i.?Filter step:compute the L1distances between F(q) and the embeddings of all training objects.Rank all database objects in order of decreasing similarity to the query,based on these distances.Mark the p highest-ranked objects.Typically this step is very fast,because it only involves measuring L1distances.

?Re?ne step:rerank the p objects marked by the?lter step,by evaluating the D X distances between q and each of those p objects.

Given the?nal ranking produced by the?lter and re-?ne steps,we can then classify q based on majority voting among its k nearest neighbors.Note that we can have p=0. In that case no re?ne step is performed,we simply use the ranking produced by the?lter step.

The performance of?lter-and-re?ne classi?cation is measured in terms of accuracy and ef?ciency.In many do-mains,the running time is dominated by the number of ex-act distances measured at the embedding step and the re?ne step.In our example application,the?lter step,i.e.,com-paring the query to the entire database using the L1metric, takes less time than evaluating the shape context matching distance D X between a single pair of objects.Overall,in-creasing embedding dimensionality and the parameter p for the?lter step tends to improve accuracy,but at the cost of computing more D X distances per query object.

4.2.Constructing a Cascade

Let F be the embedding obtained using BoostMap(or some other embedding method),and let d′be the dimensionality of F.For any d∈{1,...,d′},we de?ne embedding F d to be the embedding consisting of the?rst d dimensions of F. Given positive integers d≤d′and p we de?ne?lter-and-re?ne process F d,p to be the the?lter-and-re?ne process that uses F d as the embedding,and p as the parameter for the ?lter step.Naturally,given a query object q,as d and p in-crease,the approximate similarity ranking obtained using process F d,p will get closer to the correct ranking.For ex-ample,if p is equal to the number of training objects,then process F d,p becomes equivalent to brute force search:at the re?ne step we simply compare the query object with ev-ery training object.On the other hand,small d and p allow the?lter-and-re?ne process F d,p to give results very fast.

With appropriate choices of d i and p i we can construct a sequence P=(P1,...,P s)of s?lter-and-re?ne processes P i=F d i,p i,such that each successive process P i is less ef-?cient and more accurate than the preceding process P i?1. Table1,in the experiments,provides an example of such a sequence P.We want to use these processes to construct a cascade classi?er,that can approach the accuracy of the slowest and most accurate process,while being signi?cantly more ef?cient than the slowest process.

In order to construct such a cascade,we need to answer the following question:how can we identify,for each pro-cess P i,the test objects for which P i provides suf?cient in-formation for classi?cation?In other words,given the sim-ilarity ranking that P i produces for test object q,how can we tell whether we can reliably classify q using that rank-ing,or whether we need to pass on q to P i+1,i.e.,the next ?lter-and-re?ne process?

We will answer this question by de?ning a quantity K(q,P i)which will be a measure of con?dence in the clas-si?cation result for object q obtained using process P i.We de?ne K(q,P i)as follows:K(q,P i)is the highest integer k such that all the k nearest neighbors of q retrieved using process P i belong to a single class.For example,suppose that,according to P i,the50nearest neighbors of test object q belong to class“1”,and the51st neighbor belongs to class “2”.Then,K(q,P i)=50.

We use quantity K(q,P i)to de?ne a criterion for when P i should be used to classify q,as follows:Given a thresh-old t i,if K(q,P i)≥t i then we assign to q the class label of the t i nearest neighbors found using P i.Otherwise,we pass on q to P i+1,i.e.,the next?lter-and-re?ne process in the sequence P.

The intuition behind this criterion is that if,for some test object,process P i reports that that test object is surrounded by a large number of training objects of a single class,then we can con?dently assign that class to the test object,with-out needing to spend any additional computation.Note that we are not concerned about actually?nding the true near-est neighbor of the test object:we stop as soon as we have suf?cient information about the class label of the test object.

for q∈X validation do

X1={q∈X validation|K q≥t}

X2={q∈X1|Y q=C q}

if size(X2)≤e then

Dimensions(d)Threshold

1050

2056

4051

6051

8050

10041

10041

10041

10017

10020

10020

10024

10011

1004

1005

100NA

Table1:The sequence P=P1,...,P16of?lter-and-re?ne pro-cesses that was passed as input to Algorithm1.We used this se-quence both with BoostMap embeddings and with BoostMap-C embeddings.The dimensions column speci?es the dimensionality of the embedding,and p is the parameter specifying the number of distances to measure in the re?ne step.We also show the threshold chosen by the cascade learning algorithm,using embeddings from BoostMap-C and setting e=0.Naturally,no threshold is needed for the?nal step in the cascade.

no more than two objects.

After we have determined the right threshold for pro-cess P1,we proceed to select an appropriate threshold for P2,using the same parameter e,and using only the vali-dation objects q that P1does not classify(i.e.,for which K(q,P1)

A slight problem with the above procedure for determin-ing thresholds is that there may be some validation objects q that even the?nal process P s will misclassify,and for which K(q,P i)is very high for all processes P i.In our experi-ments,such objects were identi?ed in practice.These ob-jects are essentially outliers that look very similar to a large number of objects from another class.These objects are likely to in?uence the threshold choice t i for every P i,so that t i is large enough to avoid misclassifying those objects, even though they will end up being misclassi?ed anyway at the?nal step P s.We use a simple method to avoid this problem:before choosing thresholds,we identify all vali-dation objects that P s misclassi?es.We then remove those objects from the validation set,and proceed with threshold selection.

The exact algorithm for picking thresholds for each step in a cascade is described in Algorithm1.

5.Experiments

We applied our method to de?ne a cascade of approximate similarity measures,using shape context matching as the original distance measure D X.We evaluated our method on the publicly available MNIST database[15]of hand-written digits.MNIST consists of60,000images that are used as training objects,and10,000images that are used as test objects.Fig.1shows some of those images.The exact shape context matching error rates,obtained by com-paring the test object to all training objects,as described in [3],were0.63%for20,000training objects and0.54%for 60,000training objects,with classi?cation time per object equal to about22minutes and66minutes respectively.

We used the original BoostMap algorithm[1]and BoostMap-C,i.e.,the modi?ed algorithm proposed in this paper,to learn a100-dimensional embedding F.Filter-and-re?ne classi?cation with p=700gave error rates of0.74%for BoostMap,and0.72%for BoostMap-C,for 20,000training objects.For60,000training objects,both methods had an error rate of0.58%.These error rates were obtained at the cost of computing800exact distances per test object,i.e.,spending about53seconds to classify each test object.In Fig.2we plot error rate vs.number of exact distance evaluations per test object,using60,000training objects.We see that BoostMap-C attains its peak accuracy at around300exact distance computations per object,but it takes BoostMap about800exact distance computations per object to reach the same accuracy.

We applied Algorithm1to construct a cascade of clas-si?ers,using different values of e,ranging from0to4. The training and validation sets passed to the algorithm were disjoint sets of20,000and10,000objects respectively, randomly picked from the original MNIST training set of 60,000objects.The sequence P of?lter-and-re?ne pro-cesses that was passed as an input to Algorithm1is shown in Table1.We constructed the sequence by hand,i.e.,we manually picked d and p for each process.The thresholds were learned automatically by the algorithm.We did not experiment with any other sequence of processes,so we have no reason to believe that the sequence we manually constructed is particularly good or bad with respect to other possible sequences.Our guideline in constructing the se-quence was simply to provide an adequate number of steps, ranging from really fast and inaccurate to really slow and accurate,with the constraint that each cascade step could reuse the work done at the previous steps.

For20,000objects,and passing e=0to Algorithm1, using BoostMap we obtained an error rate of0.75%,at an average cost of measuring about149.0distances per test object,which translates to average classi?cation time of 9.9seconds per test https://www.sodocs.net/doc/1e8158900.html,ing the modi?ed algorithm BoostMap-C,the resulting cascade yielded an error rate of 0.74%,at an average cost of measuring92.5distances per

Figure2:Error rates attained using BoostMap and BoostMap-C, without a cascade,vs.number of exact D X evaluations per test object.We note that,for each number of D X evaluations per ob-ject,BoostMap-C gives either as good or somewhat better results, compared to BoostMap.60,000training objects were used.

test object,which translates to average classi?cation time of 6.2seconds per test object.Overall,using a cascade speeds up average recognition time signi?cantly,compared to us-ing BoostMap or BoostMap-C without a cascade,and by over two orders of magnitude over brute force search,at the cost of about0.1%increase in the error rate.We also see that the BoostMap-C cascade gives faster average classi?-cation time for essentially the same accuracy obtained with the BoostMap cascade.

Finally,we evaluated the same cascades using as a train-ing set all60,000training objects of the MNIST database. For each cascade,the thresholds were set to the same values that were employed in the experiments with20,000training objects.The results,for parameter e ranging from0to4, are shown in Fig. 3.For e=0and using BoostMap we got an error rate of0.66%at an average cost of123ex-act distance computations per test object.For e=0and using BoostMap-C we got an error rate of0.61%at an aver-age cost of only77.3distance computations per test object. This is a speed-up of almost three orders of magnitude over brute-force search,which achieves an error rate of0.54%. As seen in Fig.3,cascades using BoostMap-C achieve bet-ter tradeoffs of accuracy versus ef?ciency compared to cas-cades using BoostMap.

Note that increasing the training size from20,000to 60,000objects improved both the accuracy and the ef?-ciency of the cascade classi?er.This result may seem surprising at?rst glance,and is in stark contrast to tradi-tional nearest-neighbor methods,where recognition time in-creases as training set size increases.By taking a closer look at the results,we found that,as training size increased,and

Exact distance evaluations per test object

E r r o r r a t e p e r c e n t a g e

Figure 3:Error rates attained by cascade classi?ers vs.aver-age number of exact D X evaluations per test object.Cascade and Cascade-C correspond to the ?ve cascades that were learned,using embeddings constructed with BoostMap and BoostMap-C respec-tively.For each of these two embedding methods,we obtained ?ve cascades by running Algorithm 1with parameter e set respectively to e =0,1,2,3,4.60,000training objects were used.

the processes P i and thresholds t i remained ?xed,the quan-tity K (q,P i )tended to increase,meaning that more objects were classi?ed at earlier steps in the cascade.

In [25]a discriminative classi?er is trained using shape context features,and achieves an error rate of 2.55%on the MNIST dataset while measuring only distances D X be-tween the test object and 50prototypes.That method is not a nearest-neighbor method,so after learning the classi?er only the 50prototypes are needed,and the rest of the train-ing set is discarded.Overall,the cost of classifying a test object using the method in [25]is the cost of evaluating 50D X https://www.sodocs.net/doc/1e8158900.html,ing BoostMap-C and the full training set of 60,000objects,with parameter e =1we obtain a cas-cade that yields an error rate of 0.83%,while measuring on average 49.5evaluations of D X per test object.Also,as Fig.3shows,several cascades obtained using BoostMap and BoostMap-C achieve error rates under 1.2%at an av-erage cost ranging from 35to 50D X evaluations per test object.

We also compared our method to two well known meth-ods for speeding up nearest neighbor classi?cation:the condensed nearest neighbor (CNN)method [10]and vp-trees [24].For the vp-trees the pivot object for each node was selected randomly.We evaluated these methods us-ing the smaller training set of 20,000objects.Both meth-ods achieved signi?cantly worse tradeoffs between accu-racy and ef?ciency compared to our https://www.sodocs.net/doc/1e8158900.html,N selected 1060out of the 20,000training objects,speeding up classi-?cation time by approximately a factor of 20.However,the

Method

Speed-up Error factor

rate 20,000

1232

vp-trees [24]

2.3

0.66%1060

70.6

Zhang [25]400

2.55%800

53.3

BoostMap-C 25

0.72%149

9.9

Cascade-C

216

0.74%

Table 2:Speeds and error rates achieved by different methods on

the MNIST dataset,using 10,000test objects and 20,000training objects.We also show the number of exact shape context distance evaluations per query object for each method.

error rate using CNN increased from 0.63%to 2.40%.With vp-trees the error rate was 0.66%,but the attained speed up was only a factor of 2.3;an average of 8594exact distances needed to be measured per test object.

Table 2summarizes the results of all the different meth-ods on the smaller training set of 20,000objects.As we can see from those results,vp-trees,BoostMap and the cas-cade methods are the only methods that achieve accuracy comparable to brute force search.The speed-up obtained using vp-trees is pretty minor compared to using BoostMap or using a cascade.The cascade methods,and especially Cascade-C,achieve by far the best trade-offs between accu-racy and ef?ciency.

6.Discussion

We have presented two improvements to the state of the art with respect to ef?cient nearest neighbor classi?cation under computationally expensive similarity measures.The ?rst contribution is BoostMap-C,a modi?ed version of the BoostMap embedding algorithm,that constructs embed-dings using an optimization cost that is more relevant to nearest neighbor classi?cation accuracy.The second contri-bution of this paper is a method for constructing a cascade of approximations of the nearest neighbor classi?er corre-sponding to the original similarity measure.This cascade method can be applied on top of BoostMap or any other ap-propriate embedding method that may well work in a par-ticular domain,like FastMap [7]or SparseMap [13].

The method proposed in this paper is domain-independent,i.e.,its formulation is appropriate for arbi-trary spaces with computationally expensive distance mea-sures.This is in contrast to methods like LSH [14]that can only be applied to speci?c distance measures.At the same time,no theoretical guarantees can be provided that the proposed method,or any other alternative domain-independent method (including the methods proposed in [6,7,8,10,13,22,24])will actually give useful results in an arbitrary domain,in terms of accuracy and ef?ciency on

previously unseen test objects.Some knowledge about the domain,like the presence of Euclidean or metric properties, is necessary in order to provide such guarantees.

A limitation of the cascade method,at least as formu-lated in this paper,is that it cannot be applied if there is only one training object per class.Overall,the more train-ing examples per class we have,the more we expect the cascade to improve classi?cation ef?ciency.A direction for future work is to formulate alternative cascade methods that overcome this limitation.

In our experiments with the MNIST database of hand-written digits and shape context matching as the underlying distance measure,our method yielded a speed-up of about two to three orders of magnitude with respect to brute-force search,and yielded signi?cant speed-ups over using Boost-Map without a cascade,with negligible loss in accuracy. Given the good experimental results and the generality of the formulation,we believe that the proposed method can be useful in a variety of classi?cation tasks employing large databases of training objects and computationally expensive distance measures.

References

[1]V.Athitsos,J.Alon,S.Sclaroff,and G.Kollios.BoostMap:

A method for ef?cient approximate similarity rankings.In

CVPR,2004.

[2]H.G.Barrow,J.M.Tenenbaum,R.C.Bolles,and H.C.Wolf.

Parametric correspondence and chamfer matching:Two new techniques for image matching.In IJCAI,pages659–663, 1977.

[3]S.Belongie,J.Malik,and J.Puzicha.Shape matching and

object recognition using shape contexts.PAMI,24(4):509–522,2002.

[4] C.B¨o hm,S.Berchtold,and D.A.Keim.Searching in high-

dimensional spaces:Index structures for improving the per-formance of multimedia databases.ACM Computing Sur-veys,33(3):322–373,2001.

[5]T.J.Darrell,I.A.Essa,and A.P.Pentland.Task-speci?c ges-

ture analysis in real-time using interpolated views.PAMI, 18(12),1996.

[6]V.S.Devi and M.N.Murty.An incremental prototype

set building technique.Pattern Recognition,35(2):505–513, 2002.

[7] C.Faloutsos and K.I.Lin.FastMap:A fast algorithm for in-

dexing,data-mining and visualization of traditional and mul-timedia datasets.In ACM SIGMOD,pages163–174,1995.

[8]G.W.Gates.The reduced nearest neighbor rule.IEEE Trans-

actions on Information Theory,18(3):431–433,1972. [9]K.Grauman and T.J.Darrell.Fast contour matching using

approximate earth mover’s distance.In CVPR04,pages I: 220–227,2004.[10]P.E.Hart.The condensed nearest neighbor rule.IEEE Trans-

actions on Information Theory,14(3):515–516,1968. [11]G.R.Hjaltason and H.Samet.Index-driven similarity search

in metric spaces.ACM Transactions on Database Systems, 28(4):517–580,2003.

[12]G.R.Hjaltason and H.Samet.Properties of embedding

methods for similarity searching in metric spaces.PAMI, 25(5):530–549,2003.

[13]G.Hristescu and M.Farach-Colton.Cluster-preserving em-

bedding of proteins.Technical Report99-50,CS Depart-ment,Rutgers University,1999.

[14]P.Indyk.High-dimensional Computational Geometry.PhD

thesis,Stanford University,2000.

[15]Y.LeCun,L.Bottou,Y.Bengio,and P.Haffner.Gradient-

based learning applied to document recognition.Proceed-ings of the IEEE,86(11):2278–2324,1998.

[16]S.Z.Li and Z.Q.Zhang.Floatboost learning and statistical

face detection.PAMI,26(9):1112–1123,2004.

[17]G.Mori,S.Belongie,and J.Malik.Shape contexts enable ef-

?cient retrieval of similar shapes.In CVPR,volume1,pages 723–730,2001.

[18]G.Mori and J.Malik.Estimating human body con?gurations

using shape context matching.In ECCV,volume3,pages 666–680,2002.

[19] E.J.Ong and R.Bowden.A boosted classi?er tree for hand

shape detection.In Face and Gesture Recognition,pages 889–894,2004.

[20]R.E.Schapire and Y.Singer.Improved boosting algo-

rithms using con?dence-rated predictions.Machine Learn-ing,37(3):297–336,1999.

[21]P.Viola and M.Jones.Rapid object detection using a boosted

cascade of simple features.In CVPR,volume1,pages511–518,2001.

[22]X.Wang,J.T.L.Wang,K.I.Lin,D.Shasha,B.A.Shapiro,

and K.Zhang.An index structure for data mining and clus-tering.Knowledge and Information Systems,2(2):161–184, 2000.

[23] D.A.White and R.Jain.Similarity indexing:Algorithms and

performance.In Storage and Retrieval for Image and Video Databases(SPIE),pages62–73,1996.

[24]P.N.Yianilos.Data structures and algorithms for nearest

neighbor search in general metric spaces.In ACM-SIAM Symposium on Discrete Algorithms,pages311–321,1993.

[25]H.Zhang and J.Malik.Learning a discriminative classi?er

using shape context distances.In CVPR,volume1,pages 242–247,2003.

应用时间序列分析试卷一

应用时间序列分析(试卷一) 一、填空题 1、拿到一个观察值序列之后,首先要对它的平稳性和纯随机性进行检验,这两个重要的检验称为序列的预处理。 2、白噪声序列具有性质纯随机性和方差齐性。 3、平稳AR(p)模型的自相关系数有两个显着的性质:一是拖尾性;二是呈负指数衰减。 4、MA(q)模型的可逆条件是:MA(q)模型的特征根都在单位圆内,等价条件是移动平滑系数多项式的根都在单位圆外。 5、AR(1)模型的平稳域是{}1 1< < -φ φ。AR(2)模型的平稳域是{}1 1, 1 2 2 2 1 < ± <φ φ φ φ φ且 , 二、单项选择题 1、频域分析方法与时域分析方法相比(D) A前者要求较强的数学基础,分析结果比较抽象,不易于进行直观解释。B后者要求较强的数学基础,分析结果比较抽象,不易于进行直观解释。C前者理论基础扎实,操作步骤规范,分析结果易于解释。 D后者理论基础扎实,操作步骤规范,分析结果易于解释。 2、下列对于严平稳与宽平稳描述正确的是(D) A宽平稳一定不是严平稳。 B严平稳一定是宽平稳。 C严平稳与宽平稳可能等价。 D对于正态随机序列,严平稳一定是宽平稳。 3、纯随机序列的说法,错误的是(B)

A时间序列经过预处理被识别为纯随机序列。 B纯随机序列的均值为零,方差为定值。 C在统计量的Q检验中,只要Q 时,认为该序列为纯随机序列,其中m为延迟期数。 D不同的时间序列平稳性检验,其延迟期数要求也不同。 4、关于自相关系数的性质,下列不正确的是(D) A. 规范性; B. 对称性; C. 非负定性; D. 唯一性。 5、对矩估计的评价,不正确的是(A) A. 估计精度好; B. 估计思想简单直观; C. 不需要假设总体分布; D. 计算量小(低阶模型场合)。 6、关于ARMA模型,错误的是(C) A ARMA模型的自相关系数偏相关系数都具有截尾性。 B ARMA模型是一个可逆的模型 C 一个自相关系数对应一个唯一可逆的MA模型。 D AR模型和MA模型都需要进行平稳性检验。 7、MA(q)模型序列的预测方差为下列哪项(B) A、 []2 2 , Va() , l t l q r e l l q ξ ξ θθσ θθσ ?< ? =? > ?? 22 1-1 22 1q (1++...+) (1++...+)

第4章 确定型时间序列预测方法-思考与练习

第4章 确定型时间序列预测方法 思考与练习(参考答案) 1.什么是时间序列?时间序列预测方法有什么假设? 答:时间序列是一组按时间顺序排序的数据。 时间序列预测方法的假设:①假设预测目标的发展过程规律性会延续到未来。②假设预测对象的变化仅仅与实践有关。 2.移动平均法的模型参数N 的数值大小对预测值有什么影响?选择参数N 应考虑哪些问题? 答:N 值越大对数据修匀的程度越强,建立移动模型的波动也越小,预测值的变化趋势反应也越迟钝。N 值越小,对预测值的变化趋势反应越灵敏,但修匀性越差,容易把随机干扰作为趋势反应出来。 选择N 的时候首先需要考虑预测对象的具体情况,是希望对预测对象的变化趋势反应的更灵敏还是钝化其变化趋势从而更看重综合的稳定预测;其次,如果时间序列有周期性变动,则当N 的选取刚好是该周期变动的周期是,则可消除周期变动的影响。 3.试推导出三次移动平均法的预测公式。 解:有了二次移动平均的预测模型的推导过程,同理可以推广出三次移动平均法的预测模型: 已知时间序列t X X X ,...,,21,N 是跨越期 一次移动平均数:N X X X M N t t t t 1 1) 1(...+--+++= ; 二次移动平均数:N M M M M N t t t t ) 1(1 ) 1(1 ) 1() 2(...+--+++= ; 三次移动平均数:N M M M M N t t t t ) 2(1 ) 2(1 ) 2() 3(...+--+++= ; 设时间序列}{t X 从某时期开始具有直线趋势,且认为未来时期也按此直线趋势变化,则可设此直线趋势预测模型为: T b a X t t T t +=+? 其中t 为当前的时期数;T 为由t 至预测期数,,...2, 1=T ; ) 3() 2(2t t t M M a -=; )1/()(2) 3() 2(--=N M M b t t t

时间序列分析方法第章预测

第四章 预 测 在本章当中我们讨论预测的一般概念和方法,然后分析利用),(q p ARMA 模型进行预测的问题。 §4.1 预期原理 利用各种条件对某个变量下一个时点或者时间阶段内取值的判断是预测的重要情形。为此,需要了解如何确定预测值和度量预测的精度。 4.1.1 基于条件预期的预测 假设我们可以观察到一组随机变量t X 的样本值,然后利用这些数据预测随机变量1+t Y 的值。特别地,一个最为简单的情形就是利用t Y 的前m 个样本值预测1+t Y ,此时t X 可以描述为: 假设*|1t t Y +表示根据t X 对于1+t Y 做出的预测。那么如何度量预测效果呢?通常情况下,我们利用损失函数来度量预测效果的优劣。假设预测值与真实值之间的偏离作为损失,则简单的二次损失函数可以表示为(该度量也称为预测的均方误差): 定理4.1 使得预测均方误差达到最小的预测是给定t X 时,对1 +t Y 的条件数学期望,即: 证明:假设基于t X 对1+t Y 的任意预测值为: 则此预测的均方误差为: 对上式均方误差进行分解,可以得到: 其中交叉项的数学期望为(利用数学期望的叠代法则): 因此均方误差为: 为了使得均方误差达到最小,则有: 此时最优预测的均方误差为: 211*|1)]|([)(t t t t t X Y E Y E Y MSE +++-= End 我们以后经常使用条件数学期望作为随机变量的预测值。 4.1.2 基于线性投影的预测 由于上述条件数学期望比较难以确定,因此将预测函数的范围限制在线性函数当中,我们考虑下述线性预测: 如此预测的选取是所有预测变量的线性组合,预测的优劣则体现在系数向量的选择上。 定义4.1 如果我们可以求出一个系数向量值α,使得预测误差)(1t t X Y α'-+与t X 不相关: 则称预测t X α'为1+t Y 基于t X 的线性投影。 定理4.2 在所有线性预测当中,线性投影预测具有最小的均方误差。

时间序列分析word版

第2章 时间序列的预处理 拿到一个观察值序列之后,首先要对它的平稳性和纯随机性进行检验,这两个重要的检验称为序列的预处理。根据检验的结果可以将序列分为不同的类型,对不同类型的序列我们会采用不同的分析方法。 2.1 平稳性检验 2.1.1 特征统计量 平稳性是某些时间序列具有的一种统计特征。要描述清楚这个特征,我们必须借助如下统计工具。 一、概率分布 数理统计的基础知识告诉我们分布函数或密度函数能够完整地描述一个随 机变量的统计特征。同样,一个随机 变量族的统计特性也完全由它们的联 合分布函数或联合密度函数决定。 对于时间序列{t X ,t ∈T },这样来定义它的概率分布: 任取正整数m ,任取m t t t ,, ,?21∈T ,则m 维随机向量(m t t t X X X ,,,?21)’的联合概率分布记为),,,(m t t t x x x F m ??21,,,21,由这些有限维分布函数构成的全体。 {),,,(m t t t x x x F m ??21,,,21,?m ∈正整数,?m t t t ,,,?21∈T } 就称为序列{t X }的概率分布族。 概率分布族是极其重要的统计特征描述工具,因为序列的所有统计性质理论上都可以通过 概率分布推测出来,但是概率分布族的重要 性也就停留在这样的理论意义上。在实际应 用中,要得到序列的联合概率分布几乎是不 可能的,而且联合概率分布通常涉及非常复 杂的数学运算,这些原因使我们很少直接使 用联合概率分布进行时间序列分析。 二、特征统计量 一个更简单、更实用的描述时间序列统计特征的方法是研究该序列的低阶矩,特别是均值、方差、自协方差和自相关系数,它们也被称为特征统计量。 尽管这些特征统计量不能描述随机序列全部的统计性质,但由于它们概率意义明显,易于计算,而且往往能代表随机 序列的主要概率特征,所以我们对时间序列进行分析,主要就是通过分析这些统计量的统计特性,推断出随机序列的性质。 1.均值 对时间序列{t X ,t ∈T }而言,任意时刻的序列值t X 都是一个随机变量,都有它自己的概率分布,不妨记为)(x F t 。只要满足条件 ∞

随机型时间序列预测方法


6

随机型时间序列预测方法
6.1 随机型时间序列预测模型 6.2 ARMA模型的相关分析 6.3 模型的识别 6.4 ARMA序列的参数估计 6.5 模型的检验与预报
1
预测与决策


6

随机型时间序列预测方法
6.1 随机型时间序列预测模型 6.2 ARMA模型的相关分析 6.3 模型的识别 6.4 ARMA序列的参数估计 6.5 模型的检验与预报
2
预测与决策

引言
随机型时间序列预测技术建立预测模型的过程可分为四 个阶段。 第一阶段:根据建模的目的和理论分析,确定模型的基 本形式; 第二阶段:进行模型识别,即从一大类模型中选择出一 类实验模型; 第三阶段:用已有历史数据对所选择的模型进行参数估 计; 第四阶段:检验得到的模型是否合适。若合适,则可以 用于预测或控制;若不合适,则返回到第二阶段重新选择模 型。
3
预测与决策

引言
确定基本模型形式
模型识别(选择一个试验性模型)
参数估计(估计试验性模型参数)
不合适 诊断检验 合适 利用模型预测 图6.1 时间序列分析建模流程
4
预测与决策

6.1 随机型时间序列预测模型
本节讨论时间序列的几种常用模型。从实用观点来看, 这些模型能够表征任何模式的时间序列数据。这几类模型是: 1)自回归(AR)模型; 2)移动平均(MA)模型; 3)自回归移动平均(ARMA)模型; 4)求和自回归移动平均(ARIMA)模型; 5)季节性模型。 非平稳时间序列 平稳时间序列
5
预测与决策

随机时间序列分析

7 随机时间序列分析一. 随机时间序列随机过程与随机序列时间序列的性质(1) 随机过程与随机序列随机序列的现实对于一个随机序列,一般只能通过记录或统计得到一个它的样本序列x1,x2,??????, xn,称它为随机序列{ xt }的一个现实随机序列的现实是一族非随机的普通数列(2) 时间序列的统计性质(特征量) 均值函数:某个时刻t 的性质时间序列的统计性质自协方差函数:两个时刻t 和s 的统计性质时间序列的统计性质自相关函数二. 平稳时间序列模型所谓平稳时间序列是指时间序列{ xt, t=0,±1,±2,?????? } 对任意整数t,,且满足以下条件:对任意t,均值恒为常数对任意整数t 和k,r t,t+k 只和k 有关随机序列的特征量随时间而变化,称为非平稳序列平稳序列的特性方差自相关函数:自相关函数的估计平稳序列的判断一类特殊的平稳序列――白噪声序列随机序列{ xt }对任何xt 和xt 都不相关,且均值为零,方差为有限常数正态白噪声序列:白噪声序列,且服从正态分布2. 随机时间序列模型自回归模型(AR)移动平均模型(MA)自回归―移动平均模型(ARMA)(1) 自回归模型及其性质定义平稳条件自相关函数偏自相关函数滞后算子形式①自回归模型的定义描述序列{ xt }某一时刻t 和前p 个时刻序列值之间的相互关系随机序列{ εt }是白噪声且和前时刻序列xk (k

第5章 随机型时间序列预测方法-思考与练习

第5章 随机型时间序列预测方法 思考与练习(参考答案) 1.写出平稳时间序列的三个基本模型的基本形式及算子表达式。如何求它们的平稳域或可逆域? 解:(1)自回归模型(AR)的基本模型为: 1122n n n p n p n X X X X ???ε---=++++ 算子表达式为:()p n n B X εΦ=,其中)1()(221p p p B B B B ???----=Φ 令多项式方程()0p λΦ=,求出它的p 个特征根p λλλ,,,21 。若这p 个特征根都在单位圆外,即1,1,2,...,i i p λ>=,则称AR()p 模型是稳定的或平稳的。 (2)移动平均模型(MA)的基本模型为:1122n n n n q n q X εθεθεθε---=---- 算子形式:()n q n X B ε=Θ ,其中q q q B B B B θθθ----=Θ 2211)( 令多项式方程()0q λΘ=为MA()q 模型的特征方程,求出它的q 个特征根。若MA()q 的特征根都在单位圆外,则称此MA()q 模型是可逆的。 (3)自回归移动平均模型(ARMA)的基本模型为: 1111...n n p n p n n q n q X X X ??εθεθε-------=--- 算子形式:()()p n q n B X B εΦ=Θ 若特征方程()0λΦ=的所有跟都在单位圆外,那么, ()()p n q n B X B εΦ=Θ就定义一个平稳模型。与此类似,要是过程是可逆的,()0λΘ=的根必须都在单位圆外。 2. 从当前系统的扰动对序列的影响看,AR(p)序列与MA(q)序列有何差异? 答:对于任意的平稳AR()p 模型n X 都可由过去各期的误差来线性表示,而对于可逆的 MA()q 模型,n ε表示为过去各期数据n k X -的线性组合。 3. 把下面各式写成算子表达式: (1)t t t X X ε+=-15.0,

时间序列分析随机模拟

一、随机模拟实验 1.实验题目 ?)和(,是否适用于很大)2(AR 1AR ),1()( Y 2.实验目的和意义 (1)实验目的:检验公式是否适用于AR(1)和AR (2)的预测估计。 (2)实验意义:若题目成立,则对于所有的AR (1)和AR (2)模型,其预测会趋向于一条水平之直线, 3.简述实验方法和步骤 (1)首先模拟一个AR (1)序列,生成K 个数列,将n 个数搁置起来,预测搁置的n 个值, 参数估计,是否符合模型。最后在估计的序列均值上画一条水平线。 (2)首先模拟一个AR (2)序列,生成K 个数列,将n 个数搁置起来,预测搁置的n 个值, 参数估计,是否符合模型。最后在估计的序列均值上画一条水平线。 4.具体实施过程 (1)AR (1)过程 首先模拟一个过程的,)1(1008.0AR 。模拟48个值,将最后八个值搁置起来,与预测值比较。 (a )验证 和 的极大似然估计: 图表4.1极大似然估计

从图表4.1可以看出,该模型符合AR (1)模型,所以我们继续下一步。 (b )预测接下来的8个值,并画出带这8个预测值的序列,在估计的序列均值上画一条水平线。画出预测及其95%预测极限。 图表4.2预测及估计均值水平线 从图4.2中可以看出,预测值落在预测区间内,并且趋向于一条水平直线。此时仅仅是 很小的时候趋势已经很明显了,所以当 越大,)( Y 越趋向于一个均值 。 (2)AR (2)过程 首先模拟一个过程的,,)2(10075.0-5.121AR 。模拟52个值,将最后12个值搁置起来,与预测值比较。 (a )验证 和 的极大似然估计: 图表4.3极大似然估计 从图表4.3可以看出,该模型符合AR (2)模型,所以我们继续下一步。 (b )预测接下来的12个值,并画出带这12个预测值的序列,在估计的序列均值上画一条

时间序列平稳性分析(课件)

时间序列平稳性分析(课件) 时间序列平稳性分析 文章结构 ?时间序列的概念 ?平稳性检验 ?纯随机性检验 ?spss的具体操作 1.1时间序列分析的概念?时间序列是一个按时间的次序排列起来的随机数据集合。而时间序列分析是概率论与数理统计学科的一个重要分支,它以概率统计学为理论基础来分析随机数据序列(或称为动态数据序列)并对其建立相应的数学模型,即对模型定阶,进行参数估计,进一步将用于预测。 在对时间序列进行分析的时候我们的前提任务是如何进行的呢? 2.1平稳性检验 ? ? ? ? ?特征统计量平稳时间序列的定义平稳时间序列的统计性质平稳时间序列的意义平稳性检验 概率分布 ?概率分布的意义 随机变量族的统计性质完全由它们的联合分布函数或联合密度函数决定 ?时间序列概率分布族的定义 { }Ft1,t2,...,tm(x1,x2,...,xm) m(1,2,...,m),t1,2,...,T ?实际应用局限性

概率分布 ?概率分布的意义 随机变量族的统计性质完全由它们的联合分布函数或联合密度函数决定?时间序列概率分布族的定义 { }Ft1,t2,...,tm(x1,x2,...,xm) m(1,2,...,m),t1,2,...,T ?实际应用局限性 特征统计量 ?均值 t EXt ?方差 Var(Xt)E(Xt t) xdFt(x) 2 (x t)dFt(x) ?协方差?自相关系数 (t,s)E(Xt t)(XS) S (t,s)

(t,s) DXt DXs 平稳时间序列的定义 ?严平稳 严平稳是一种条件比较苛刻的平稳性定义,它认为只有当序列所有的统计性质都不会随时间的推移而发生变化时,该序列才能被认为平稳?宽平稳 宽平稳是使用序列的特征统计量来定义的一种平稳性。它认为序列的统计性质主要由它的低阶矩决定,所以只要保证序列低阶矩平稳(二阶),就能保证序列的主要性质近似稳定。 ?满足如下条件的序列称为严平稳序列 正整数m,t1,t1,...,tm T,正整数t,有 Ft1,t2,...,tm(x1,x2,...,xm)Ft1t,t2t,..., ?满足如下条件的序列称为宽平稳序列 1)EXt,t T 2)EXt,为常数,t T 2 tm t (x1,x2,...,x 3)(t,s)(k,k s t),t,s,k且k s t T ?常数性质 ?自协方差函数和自相关函数只依赖于时间的平移长度而与时间的起止点无关1)延迟k自协方差函数 (k)(t,t k),k为整数 2)延迟k自相关系数 k

相关主题