Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes

Thu, 02/05/2009 - 00:40 by Damien Saucez • Categories:

“Characteristics of Destination Address Locality in Computer Networks: A Compar-
ison of Caching Schemes,” R. Jain, Computer Networks and ISDN, vol. 18, pp. 243–254, 1989/1990.

This paper is probably the first to think about locality in network traffic. At that time, most of the networks were built on pure proprietary architectures, however, some of the conclusions are still applicable today. However, as the analysis was performed on non IP networks, we'll not discuss the experimental part in this summary.

The idea the author had was to apply the concept of caching well studied for computer virtual memory and file systems.

The paper has thus remember the concept of temporal locality which covers the probability that if we observe k at time t, we'll observe k at time t+1. But also the notion of spatial locality which covers the probability that if we observe k at time t, we'll observe k+1 at time t+1.

Naturally, the paper asks the question "what is k+1 for address?". This question is still open today, i.e., it is very difficult to determine what is a neighbor of an IP.

The two important concepts presented in the paper are: persistence and concentration.

The persistence is the tendency to repeat the use of the same address (count the consecutive references to the same address in the data set)

The concentration is the tendency to be limited to a small subset of the whole address space (just compute the fraction of address space used).

A strong concentration means that small caches allows a high hit rate. On the other hand, persistent addresses means that while in the cache they are hitted more or less often.

As for traditional caches, the three important metrics are
- miss probability
- interfault distance, i.e., the number of hits between two miss
- normalized search time: search time with cache/search time without cache

The traffic analysis is done with the 4 traditional cache replacement algorithms:
MIN: remove the entries that has the lower probability to be hited in the future (impossible to do without knowing the future at the time of the paper)
- LRU: Least Recently Used based on a stack implementation
- IRM: Independent Reference Model: the proba of having k is the number of time k is observed divided by the number of observations
- RANDOM: just remove a random entry