搜档网
当前位置:搜档网 › The Temporal and Topological Characteristics of BGP Path Changes

The Temporal and Topological Characteristics of BGP Path Changes

The Temporal and Topological Characteristics of BGP Path Changes
The Temporal and Topological Characteristics of BGP Path Changes

The Temporal and Topological Characteristics of BGP Path Changes Di-Fa Chang Ramesh Govindan John Heidemann

USC/Information Sciences Institute,4676Admiralty Way,Marina del Rey,CA90292,USA

Abstract

BGP has been deployed in Internet for more than a decade.However,the events that cause BGP topological changes are not well understood.Although large traces of routing updates seen in BGP operation are collected by RIPE RIS and University of Oregon RouteViews,previous work examines this data set as individual routing updates. This paper describes methods that group routing updates into events.Since one event(a policy change or peering failure)results in many update messages,we cluster up-dates both temporally and topologically(based on the path vector information).We propose a new approach to ana-lyzing the update traces,classifying the topological impact of routing events,and approximating the distance to the the Autonomous System originating the event.Our analy-sis provides some insight into routing behavior:First,at least45%path changes are caused by events on transit peerings.Second,a signi?cant number(23–37%)of path changes are transient,in that routing updates indicate tem-porary path changes,but they ultimately converge on a path that is identical from the previously stable path.These ob-servations suggest that a content provider cannot guarantee end-to-end routing stability based solely on its relationship with its immediate ISP,and that better detection of transient changes may improve routing stability.

1.Introduction

BGP[17]is a policy-based path-vector routing proto-col deployed in Internet for inter-domain routing.The In-ternet is divided into tens of thousands of autonomous rout-ing domains,of which over15thousand are currently asso-ciated with Autonomous System Numbers(ASNs)for the purpose of interdomain routing.BGP routers in each AS transmit routing messages to other BGP routers in the same AS and other ASes through internal and external BGP con-nections,respectively.Routing messages containing reach-ability information are called BGP updates.To facilitate the study on the operational use of BGP,there are public BGP routing message collection sites such as RIPE’s RRCs[1](Remote Route Collectors)and Oregon University’s Route-Views[2]that collect BGP updates and routing tables from tens of BGP routers located in various ASes.These data sets provide researchers and operators a local perspective on the visible Internet BGP routing status.Researchers have been using the collected routing tables and routing messages to study the Internet topology at AS-level[19,20,6],moni-tor the Internet growth[8],examine the inter-domain rout-ing stability[18],investigate BGP router miscon?guration [16,21],and derive the model for BGP traf?c[15].In this paper we present a systematic approach to decompose the stream of BGP updates into small sequences of path ad-vertisements with the purpose of distinguishing the routing events that cause the BGP routing changes.The goal of this study is to answer the following questions:

When a BGP router observes a path(AS

Most path advertisements(93%)in a stream of BGP updates have a topologically relation with a few oth-ers.To our knowledge,this is the?rst quanti?cation on the topological relation among the path advertise-ments in a BGP update stream.It can help design a more realistic BGP traf?c model for lab use in addi-tion to those metrics considered in[15].

At least45%of path changes are caused by routing events in transit peerings,where transit peerings are the peerings that transit traf?c to the destinations not inside the peering ASes.

A signi?cant number of path changes(23–37%)are

transient which means that BGP AS

SET.That is,,,may denote a set of ASes.For exam-ple,the path(2914354919548[12393356784319094]) means that can be one of AS1239,AS3356,AS7843, and AS19094.However,we observed that only a very small number(0.02%)of pre?xes in Internet have paths including

Figure1.Example of AS topology.Two van-

tage points and,in ASes and re-

spectively,observe an event that changes

their paths from old paths(solid lines)to new

paths(dashed lines).

AS

PATH attribute is AS prepending[17].For exam-ple,the path(1103354970174747474747474767570) means that AS7474announces this path by prepending its AS number three times.AS prepending is used to increase the path length for traf?c engineering purpose,since a path with longer length will be less likely selected by other ASes. As a consequence,it is possible that

for.We observe that the number of AS prepending varies from1to14in our data set.Our clus-tering algorithms can deal with the AS

Policy changes at:AS may add an input?lter or de-

crease the local

PATH,timestamp,the router ID of the router that

transmitted it.

Pattern Set:A pattern set is the set of patterns to be clus-

tered and is denoted by.For exam-

ple,a stream of BGP updates is a pattern set.

Cluster:A cluster is a subset of the pattern

set such that the patterns in a cluster are more similar

to each other than patterns in different clusters.

Similarity:A measure of similarity between two patterns

is calculated based on their features.The de?nition of

similarity plays an essential role in the interpretation of

the generated clusters.For example,the similarity de-

?ned by computing the difference of the origin ASes of

two updates will result in clusters that consists of up-

dates propagated from the same origin AS.

Since the partitional approach requires a pre-speci?ed

number of clusters which has no justi?able approximation,

we take the hierarchical approach.Two hierarchical clus-

tering algorithms are employed in this study:agglomer-

ative single-link and complete-link clustering algorithms.

The original algorithms require to process the pattern set

many passes until all patterns are in one cluster,hence con-

sume enormous computation power and storage.We mod-

ify the algorithms such that they terminates as soon as the

desired partitions are generated.

The algorithms can be found in[9,10],and are stated be-

low for completeness.

Agglomerative Single-Link Clustering Algorithm:In

this method,two clusters can be merged into one clus-

ter if there exists a pair of patterns in the two clus-

ters having a similarity measure above some threshold.

It has a tendency to generate clusters that are strag-gly or elongated.

1.Place each pattern in its own cluster.Calculate

the measure of similarity between any two pat-

terns and sort the list of similarity measures in

ascending order.

2.Step through the sorted list of similarity mea-

sures,forming a graph where pairs of patterns

closer than a pre-speci?ed threshold of similar-

ity are connected by a graph edge.

3.Each maximally connected subgraph forms a

cluster.

Agglomerative Complete-Link Clustering Algorithm: In this method,two clusters can be merged into one cluster if all pairwise similarity measures between pat-terns in the two clusters are above the threshold.

It produces more tightly bound or compact clus-ters than the single-link method.

1.Place each pattern in its own cluster.Calculate

the measure of similarity between any two pat-

terns and sort the list of similarity measures in

ascending order.

2.Step through the sorted list of similarity mea-

sures,forming a graph where pairs of patterns

closer than a pre-speci?ed threshold of similar-

ity are connected by a graph edge.

3.Each maximally completely connected subgraph

forms a cluster.

2.4.Identifying Events from Routing Updates

In this section,we describe the methods to identify rout-ing events in a stream of BGP updates.The method of iden-

tifying events is to decompose the stream into small se-quences of path advertisements,i.e.,clusters.A BGP up-date consists of path advertisements for various pre?xes. The path advertisement can be an announcement of new path,a withdrawal of the current path,or an update that changes attributes of the current path such as MED,LO-CAL

PATH at-tribute that represents at the time. The similarity function is de?ned as:

otherwise is a parameter of the similarity function.When apply-ing this similarity function to the single-link clustering algo-rithm,we set the threshold such that any pat-

tern has a non-negative similarity measure with at least one pattern in the same cluster,while patterns in different clus-ters have negative similarity measure.In other words,path advertisements within a cluster are sent by the same van-tage point for the same pre?x and each of them has the re-ceiving time within seconds of the receiving time of at least one path advertisement in the same cluster.The ratio-nale behind this type of clustering is described as follows.

When a routing event happens,the BGP routing system can take minutes to converge to a stable routing state.During the converging period,many path advertisements are trans-mitted due to transient path changes.Since these path adver-tisements are all resulted from a single routing event,they should be grouped into a single cluster.The parameter represents our assumption that no two routing events occur within seconds and cause path changes of the same pre-?x.However,the routing convergence time may be longer than.In this case,it is unable to determine whether two path advertisements that are for the same pre?x,transmit-ted by the same vantage point,and within seconds are caused by different routing events.Thus,we just make the assumption that these two path advertisements are caused by the same routing event.This makes the single-link algo-rithm the appropriate one for pre?x-based clustering.

After the clustering?nished,we assign each cluster the path-change type and identify the possible peerings where the routing event may occur.The possible peerings are the peerings that are not shared by and.The path-change type is one of long,short,and equal,and is deter-mined by comparing the lengths of old path and the new path as described in Sec.2.2.Several things are worth noting:

is the path of the last path advertisement in the cluster,while is the path of the path advertise-ment before the?rst path advertisement in the cluster.

If a path advertisement has an empty path,i.e.,it’s a path withdrawal,then we say its path length is in?nity.

Thus,if,then the cluster has a long path-change type.Conversely,if,then the cluster has a short path-change type.

Sometimes,is the same as.This is be-cause two routing events(e.g.,a peering failure fol-lowed by a peering repair,or a short BGP session re-set)occur close in time such that the path advertise-ments they triggered are falsely grouped into one clus-ter.We call this path change as a transient path change.

In this case,we want to select a transient path that can help determine the cause of path change.If the clus-ter consists of only two path advertisements,then the choice is obvious since there is only one transient path.

If many path advertisements are contained in the clus-ter,the transient path is selected according the duration it remains unchanged—a similar heuristic as we se-lect the converged paths.Speci?cally,we set to the?rst transient path that is different from and lives longest.

For peering-based clustering,a pattern is a pre?x-based cluster denoted by=(,,,), where is the time of the?rst path advertisement in and is-1,1,or0corresponding to the long,short,or equal path-change types.In computing and ,we remove from them those shared peerings, i.e.,

The similarity function is de?ned as:

and

and

otherwise is a parameter of the similarity function.When ap-plying this similarity function to the complete-link algo-rithm,we set the threshold such that any two

patterns in the same cluster have a non-negative similarity measure,while patterns in different clusters have negative similarity measures.The?rst condition in states that if(or)is of type long and(or)is of type short,then they have a negative similarity measure,hence the two clusters containing them cannot be merged together. The intuition behind this type of clustering is described as follows.When a routing event causes long path changes, it may also cause equal path changes,but less likely the short path changes.Based on our observation,we also as-sume that this routing event is more likely to take place in than in.Thus,we only consider the

in the second condition.Similarly,when a rout-ing event causes short path changes,it may also cause equal path changes,but less likely the long path changes.And we assume that this routing event is more likely to take place in than in,which is stated in the third condition.The forth condition states that the?rst path ad-vertisements in and must be transmitted within sec-onds.

We also assign each peering-based cluster a path-change type and identify the possible peerings where the routing event may occur.If the peering-based cluster contains at least one pre?x-based cluster of type long,then it has a long path-change type.If the peering-based cluster contains at least one pre?x-based cluster of type short,then it is of type short.Otherwise,it is of type equal.For peering-based clusters of type long,the possible peerings are the inter-section set of of all pre?x-based clusters in it. For peering-based clusters of type short,the possible peer-ings are the intersection set of of all pre?x-based clusters in it.For peering-based clusters of type equal,the possible peerings include the intersection set of

and the intersection set of.

2.5.Identifying Where Events Happen

This paper provides analysis on the generated clusters to estimate the distances from the event originator to both ends of the path.We de?ne as the number of AS hops from the event originator to the origin AS,and as the number of AS hops from the event originator to the van-tage point.We use a conservative heuristic to determine their lower bound.The heuristic works as follows.Given a pre?x-based cluster,?nd the set of peerings

.The ASes in these

AS peerings are candidates of the event originator.De?ne as the minimum number of AS hops between the candi-

dates and the origin AS.Similarly,is set to the minimum number of AS hops between the candidates and the AS con-

taining the vantage point.

In the scenario of Figure1,pre?x-based clustering will

generate two clusters:one for the path change between and,another for the path change between and.In the ?rst cluster,the candidates of event originators include,, ,,and,hence the distance estimates are. On the other hand,in the second cluster the candidates of

event originators include,,,,and,hence the dis-tance estimates are and.

This heuristic does not work well for the cases that one of and is empty,which always results in .We use another heuristic for these cases.The idea is to use other vantage point’s stable path to infer the .For example,vantage point observes a path change at time for pre?x:from to

which is empty.Suppose that vantage point has a path

that remains unchanged during the period from to,then is a stable path. Then,we?nd the common peerings in both

and to de?ned.Distance is always set to 0for these cases.If there are no stable paths to this pre-?x or other vantage points don’t even have paths to this pre-?x.we de?ne which means the heuristics are unable to determine the values due to lack of informa-tion.

For peering-based clusters,we set(and)to the mean value of the’s(and’s)of the pre?x-clusters in them.

3.Trace of BGP Updates

Our analysis is based on a year-long trace of BGP up-dates collected from31vantage points from July2002to June2003.This trace was collected by RouteViews[2],and contains about1,680million path advertisements.Of these, 62%(about1,040M)change the AS

PATH and will discard the duplicates(38%in our data set).

4.Analysis on Pre?x-based Clusters

We conduct three analyses of the one-year trace by ap-plying the clustering algorithms with parameter set to60, 120,and240seconds,respectively.The numbers of pre?x-based clusters generated are shown in Table1(a).

4.1.Cluster Duration

The number of clusters generated by the clustering algo-rithm is a function of the value of.The clustering with large may group the path advertisements caused by mul-tiple related events if the events occur closely in time.For example,path advertisements caused by a peering failure followed by a repair can be grouped into one cluster if the failover time is smaller than,where is the convergence time of the failure.Convergence time of an event is the interval from the time when the event hap-pens to the time when all ASes in Internet observe the new paths resulted from the https://www.sodocs.net/doc/0417960221.html,bovitz et al[11]demon-strated that most routing events converge within180sec-onds.Selecting a smaller than this value will therefore reduce the possibility of clustering multiple events as one.

We de?ne cluster duration as the time interval between the?rst path advertisement and the last path advertise-ment in the same cluster.Figure2plots the complimen-tary cumulative distribution function(CCDF)of the dura-tion of pre?x-based clusters in log-log scale.As shown in Table1(a),more than97%of pre?x-based clusters have du-ration seconds as,while the percentage drops to89%as.This justi?es the suggestion of setting the parameter.

Figure3shows the size of pre?x-based clusters in num-bers of path advertisements.About half of pre?x-based clusters consist of only one path advertisement,hence have duration=0.Part of the reason for a large number of sin-gleton advertisements is the BGP rate limiting mechanism, where a router won’t send more than one path announce-ment for the same pre?x within MinRouteAdver time.As

(a)

Number of Pre?x-based Clusters

448,991,021(65%)

323,466,254(54%)237,397,186(46%)

with duration

sec

30seconds 60seconds 90seconds 39,312,537

32,476,73529,499,547with duration

sec

37,739,035(96%)

30,201,353(93%)

26,844,578(91%)

Table 1.Results of clustering:(a)number of pre?x-based clusters ;(b)number of peering-based clusters .

1

10010000

1e ?061e ?041e ?021e +00

Prefix Cluster Duration (seconds)

1 ? C u m u l a t i v e P r o b a b i l i t y T1=60T1=120T1=240

Figure 2.Distribution of the duration of pre?x-based cluster.1e ?071e ?03

Number of Advertisements

1 ? C u m u l a t i v e P r o b a b i l i t y T1=60

T1=120T1=240

Figure 3.Distribution of the number of path advertisements in a pre?x-based cluster.

explained in previous section,the convergence time of path changes is typically small due to the small number of al-ternate paths.Thus,if the path change converges before the vantage point’s MinRouteAdver timer expires,then only one path advertisement is transmitted by the vantage point.To understand know how many path changes are caused by origin AS changing,we calculate the number of ori-gin ASes appeared in a pre?x-based cluster.Table 2shows the results.represents the path changes resulted from

origin AS changing.The cases of

may indicate mul-tihoming or miscon?guration in network [16].

Type of Path Change

P r o b a b i l i t y Figure 4.Type of path change for pre?x-based cluster .

Type of Path Change

C o n t r i b u t e d R o u t i n g L o a d Figure 5.Routing load contributed by each type of path change.

4.2.Types of Path Changes

For each pre?x-based cluster,we determine its type of path change using the heuristic described in Section 2.4.Figure 4plots the probability of each type,where long ,short ,and equal are denoted by L,S,and E,respectively.The ?gure also separates the transient path changes from non-transient path changes.Transient path change means that the paths before the events are the same as the paths after the events.In the ?gure,Tx1means there is only one transient path in the cluster,and Tx2means the clusters have more than one transient paths.The number of long pre?x-based cluster (L)is roughly the same as that of short pre?x-based cluster (S).This complies to the intuition that a path failed will be repaired some time in the future.

It is interesting that 23–37%of pre?x-based clusters are

60seconds

586,192,742(97.86%)11,860,429(1.98%)898,517(0.15%)59,801101none 240seconds

AS Hops

P r o b a b i l i t y

AS Hops

P r o b a b i l i t y 0.0

0.2

0.40.60.8

1.0

0.00.20.40.60.81.0

Distance Ratio

C u m u l a t i v e P r o b a b i l i t y T1=60T1=120T1=240

Figure 6.(a)Lower bound of the distance from event originator to origin AS.(b)Lower bound of the

distance from event originator to vantage point.(c)Distance ratio .

1

101001000100001e ?071e ?03

Number of Prefix?based Clusters

1 ? C u m u l a t i v e P r o b a b i l i t y T2=30

T2=60T2=901

101001000100001e ?071e ?03

Number of Advertisements

1 ? C u m u l a t i v e P r o b a b i l i t y

T2=30T2=60T2=90

1

10100100010000

1e ?061e ?041e ?021e +00

Cluster Duration (seconds)

1 ? C u m u l a t i v e P r o b a b i l i t y T2=30

T2=60T2=90

Figure https://www.sodocs.net/doc/0417960221.html,DF of (a)the number of pre?x-based clusters,(b)the number of the path advertisements,and (c)the duration in a peering-based cluster.Type Long

15M (39.11%)

13M (39.42%)

12M (40.01%)

Equal

1

10010000

1e ?071e ?03

Number of Unique Prefixes

1 ? C u m u l a t i v e P r o b a b i l i t y T2=30

T2=60T2=90

AS Hops

P r o b a b i l i t y 1

101001000100001e ?051e ?031e ?01

Number of Peering?based Clusters

1 ? C u m u l a t i v e P r o b a b i l i t y T2=30

T2=60T2=90

Figure 8.Peering-based cluster:(a)CCDF of the number of unique pre?xes.(b)Approximate distance ()from an event to the origin.(c)CCDF of the number of peering-based clusters caused by events in a single peering.

An approach to approximating the distance between the events and the observer and originator of the pre-?x.This analysis suggests that at least 45%of path changes occur outside the origin AS.

Our analysis results suggest several directions for future work:

We provide an upper bound on the number of BGP path-change events.However,the large number of can-didates for event originator prevent us to do further clustering.Thus,estimating the exact number of events remains an open question.

We observed that many path advertisements (35–52%)result from transient path changes.This result suggests that the number of routing updates can be noticeably reduced by reducing these transient events.

References

[1]RIPE.Routing Information Service .

https://www.sodocs.net/doc/0417960221.html,/.[2]University of Oregon.Route Views Project .

https://www.sodocs.net/doc/0417960221.html,/route-views/.

[3]M.R.Anderberg.Cluster analysis for applications .Aca-demic Press,1973.

[4] D.Andersen,N.Feamster,S.Bauer,and H.Balakrish-nan.Topology inference from BGP routing dynamics.In Proceedings of the ACM SIGCOMM Internet Measurement Workshop ,2002.

[5] D.-F.Chang,https://www.sodocs.net/doc/0417960221.html,indan,and J.Heidemann.The temporal

and topological characteristics of BGP path changes.Tech-nical report,USC/Information Sciences Institute,Aug.2003.https://www.sodocs.net/doc/0417960221.html,/?difac/isi-tr-2003.ps.gz.

[6]M.Faloutsos,P.Faloutsos,and C.Faloutsos.On power-law relationships of the Internet topology.In Proceedings of the ACM SIGCOMM ,pages 251–262,1999.

[7]N.Feamster, D.Andersen,H.Balakrishnan,and M. F.

Kaashoek.Measuring the effects of Internet path faults on

reactive routing.In Proceedings of the ACM SIGMETRICS ,2003.[8]G.Huston.BGP Table Statistics.https://www.sodocs.net/doc/0417960221.html,/ops/bgp .

[9] A.K.Jain and R.C.Dubes.Algorithms for clustering data .Prentice Hall,1988.

[10]

A.K.Jain,M.N.Murty,and P.J.Flynn.Data clustering:a review.ACM Computing Surveys ,31(3):264–323,Sept.1999.

[11]

https://www.sodocs.net/doc/0417960221.html,bovitz,A.Ahuja,A.Abose,and F.Jahanian.An ex-perimental study of delayed Internet routing convergence.In Proceedings of the ACM SIGCOMM ,pages 175–187,2000.[12]

https://www.sodocs.net/doc/0417960221.html,bovitz,A.Ahuja,and F.Jahanian.Experimental study of internet stability and wide-area network failures.In Pro-ceedings of the FTCS ,1999.

[13]

https://www.sodocs.net/doc/0417960221.html,bovitz,G.R.Malan,and F.Jahanian.Internet rout-ing instability.IEEE/ACM Transactions on Networking ,6(5):515–528,Oct.1998.

[14]

https://www.sodocs.net/doc/0417960221.html,bovitz,G.R.Malan,and F.Jahanian.Origins of In-ternet routing instability.In Proceedings of the IEEE INFO-COM ,pages 218–226,1999.

[15]O.Maennel and A.Feldmann.Realistic BGP traf?c for test labs.In Proceedings of the ACM SIGCOMM ,2002.

[16]

R.Mahajan,D.Wetherall,and T.Anderson.Understand-ing BGP miscon?guration.In Proceedings of the ACM SIG-COMM ,2002.

[17]Y .Rekhter,T.Li,and S.Hares.A Border Gateway Protocol 4(BGP-4),draft-ietf-idr-bgp4-20.txt edition,Apr.2003.[18]

J.Rexford,J.Wang,Z.Xiao,and Y .Zhang.BGP routing stability of popular destinations.In Proceedings of the ACM SIGCOMM Internet Measurement Workshop ,2002.

[19]

L.Subramanian,S.Agarwal,J.Rexford,and R.H.Katz.Characterizing the Internet hierarchy from multiple vantage points.In Proceedings of the IEEE INFOCOM ,June 2002.[20]

H.Tangmunarunkit,https://www.sodocs.net/doc/0417960221.html,indan,S.Jamin,S.Shenker,and https://www.sodocs.net/doc/0417960221.html,work topology generators:Degree-based vs.structural.In Proceedings of the ACM SIGCOMM ,2002.[21]

X.Zhao,D.Pei,L.Wang,D.Massey,A.Mankin,S.Wu,and L.Zhang.An analysis of BGP multiple origin as (MOAS)con?icts.In Proceedings of the ACM SIGCOMM Internet Measurement Workshop ,pages 31–35,2001.

相关主题