Internet Topology Discovery
Sat, 04/28/2007 - 00:37 by Damien Leroy • Categories:
Systems for active measurements in the internet are undergoing a radical shift. Whereas the present generation of systems operates on largely dedicated hosts, numbering between 20 and 200, a new generation of easily downloadable measurement software means that infrastructures based on thousands of hosts could spring up literally overnight. Unless carefully controlled, these new systems have the potential to impose a heavy load on parts of the network that are being measured. They also have the potential to raise alarms, as their traffic can easily resemble a distributed denial of service (DDoS) attack. Our research aims at examining the problem, and proposing and evaluating an algorithm for controlling one of the most common forms of active measurement: traceroute.
There is a number of systems active today that aim to elicit the internet topology at the IP interface level. The most extensive tracing system, CAIDA's skitter, uses 24 monitors, each targeting on the order of one million destinations. Some other well known systems, such as the RIPE NCC's TTM service and the NLANR AMP, have larger numbers of monitors (between one- and two-hundred), and conduct traces in a full mesh, but avoid tracing to outside destinations.
However, recent studies have shown that reliance upon a relatively small number of monitors to generate a graph of the internet can introduce unwanted biases. Removing spatial bias is not the only reason to employ measurement systems that use a larger number of monitors. With more monitors to probe the same space, each one can take a smaller portion and probe it more frequently. Network dynamics that might be missed by smaller systems can more readily be captured by the larger ones while keeping the workload per monitor constant.
Given that much large scale network mapping is on the way, contemplating such a measurement system demands attention to efficiency, in order to avoid generating undesirable network load. Unfortunately, this issue has not been yet successfully tackled by the research community. Traceroutes emanating from a large number of monitors and converging on selected targets can easily appear to be a DDoS attack. Whether or not it triggers alarms, it clearly is not desirable for a measurement system to consume undue network resources. A traceroute@home system, as we label this class of applications, must work hard to avoid sampling router interfaces and traversing links multiple times, and to avoid multiple pings of end systems.
Until now, we achieved several steps towards an network-friendly topology discovery system. Our first contribution was to evaluate the extent to which classical topology discovery systems involve duplicated effort. By classical topology discovery, we mean those systems, such as skitter, tracing from a small number of monitors to a large set of common destinations, such as skitter. Duplicated effort in such systems takes two forms: measurements made by an individual monitor that replicate its own work, and measurements made by multiple monitors that replicate each other's work. We termed the first intra-monitor redundancy and the second inter-monitor redundancy.
Our analysis of the nature of redundant probing suggested more efficient algorithms for topology discovery. We firstly proposed a family of efficient tracing algorithms that allow one to probe from a single source. These algorithms aims at reducing the intra-monitor redundancy. On the contrary to standard traceroute, these algorithms works backwards (i.e., starting far from the probing source with a decreasing TTL) and stops probing when encountering an interface previously discovered. These algorithms were evaluated using skitter data, implemented in Java and deployed on the PlanetLab testbed. Secondly, we proposed and evaluated an algorithm called Doubletree that aims at reducing both redundancy at the same time. Doubletree takes advantage of the tree-like structure of routes, either emanating from a single source to multiple destinations or routes converging from multiple sources to a single destination, to avoid duplication of effort. Unfortunately, general strategies for reducing these two kinds of redundancy are in conflict. On the one hand, intra-monitor redundancy is reduced by starting probing far from the monitor, and working backward along the tree-like structure that is rooted at that monitor. Once an interface is encountered that has already been discovered by the monitor, probing stops. On the other hand, inter-monitor redundancy is reduced by probing forwards towards a destination until encountering a previously-seen interface. It further requires cooperation between probing monitors. We showed a means of balancing these conflicting strategies in Doubletree. In Doubletree, probing starts at a distance that is intermediate between monitor and destination. We further proposed several improvements to Doubletree in order to reduce the communication cost and the impact on end-hosts. Doubletree has been implemented in a prototype written in Java and evaluated on the PlanetLab testbed.
Measuring the internet is an ongoing task. Until now, we provided several contributions towards an eventual large-scale topology discovery infrastructure. Our results opened new paths, on which further research could be done and other issues explored. We are currently working probing algorithm guided by BGP. We believe that it could be useful for Doubletree, or any kind of traceroute-like probing system, to make use of higher level information, such as BGP information. We are also improving the Doubletree prototype in order to make the prototype scalable, robust, flexible and reliable.
Short Bibliography
- V. Jacobsen et al. Traceroute. Unix man pages. 1989.
- B. Huffaker, D. Plummer, D. Moore and k. Claffy. Topology Discovery by Active Probing in Proc. Symposium on Applications and Internet. 2002.
- A. McGregor, H. W. Braun and J. Brown. The NLANR Network Analysis Infrastructure. in Proc. IEEE Communcation Magazine. 2000.
- F. Georgatos, F. Gruber, D. Karrenberg, M. Santcroos, A. Sunsanj, H. Uijterwaal and R. Wilhelm. Providing Active Measurements as a Regular Service for ISPs in Proc. PAM. 2001.
- Y. Shavitt. DIMES. Distributed Internet Measurements & Simulations.
- N. Spring, R. Mahajan and D. Wetherall. Measuring ISP Topologies with Rocketfuel. in Proc. ACM Sigcomm. 2002.
- R. Govindan and H. Tangmunarunkit. Heuristics for the Internet Map Discovery. in Proc. IEEE Infocom. 2000.
- A. Broido and k. Claffy. Internet Topology: Connectivity of IP Graph. in Proc. SPIE International Symposium on Convergence of IT and Communication. 2001.
- M. Faloutsos, P. Faloutsos and C. Faloutsos. On Power-Law Relationships of the Internet Topology. in Proc. ACM Sigcomm. 1999.
- A. Lakhina, J. Byers, M. Crovella and X. Pie. Sampling Biases in IP Topology Measurements in Proc. IEEE Infocom. 2003.
- A. Clauset and C. Moore. Traceroute Sampling Makes Random Graphs Appear to Have Power-Law Degree Distributions. arXiv:cond-mat/0312674 v3 8 Feb. 2004.
- R. Siamwalla, R. Sharma and S. Keshav. Discovering Internet Topology. Cornell University. 1998.