On Network-Aware Clustering of Web Clients

Mon, 04/14/2008 - 23:23 by Damien Saucez • Categories:

The paper "On network-aware clustering of web clients" presented at SIGCOMM ’00 by Krishnamurthy and Wang proposes clustering techniques to move content closer to the clients which are generating most of the traffic.

Before the paper, cluster of clients were simply /24 networks. The authors propose to group clients by BGP prefixes.

The paper is focused on web traffic and to determine the quality of the BGP clustering, the authors analyses web server logs. Based on the logs, they map client IP addresses to BGP prefixes.

The idea behind this decomposition is that /24 is not network aware and not represent the real decomposition of the Internet while BGP should be more related to the effective decomposition of the IP address space. From the logs, it is possible to cluster most of the clients (99.9%).

The paper proposes to validate this clustering with two techniques. (i) nslookup permits to determine if all the hosts within the same cluster belongs to the same organization, however has it is not possible for around 50% of the client to identify their DNS name, (ii) the authors propose to use as simplified traceroute to identify if cluster nodes share the same routing path suffix.

By testing nslookup and traceroute, around 90% of the hosts pass the test for their cluster. To improve the results and mostly to clusterize the 0.1% unresolved hosts adapative clustering is proposed based on nslookup and traceroutes.

This paper is related to our IDIPS researches (http://inl.info.ucl.ac.be/idips).