From:	Simon Krueger [skruege2 SpamElide]
Sent:	Thursday, February 10, 2011 11:59 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

A Scalable Content-Addressable Network, Ratnasamy, Francis, Handley, and Karp

The core idea of this paper is to form a distributed map (aka key value store) called Content Addressable Network (CAN) using a logical spatial partitioning scheme. Keys in the map are stored in nodes in that occupy a range/partition of the logical space. To determine which nodes store which keys, a hash function is applied to the keys that then maps the keys to points in the logical space. The node that owns a particular logical space is the one responsible for all the points in its space. If a node receives a request for a point that it does not own the node will route the request to its neighboring nodes.

The advantages of CAN are that they apply novel techniques of spatial partition to a distributed system. Effectively they can take all the benefits applied from spatial partitioning and apply them to the keys in the data set by constructing a logical space. This then allows them to handle churn, and load balancing well because when a node enters they can easily split the most heavily loaded node, and when a node leaves a neighbor can take over the its zone.

The disadvantages of CAN are that they apply this logical spatial partition over a physical space so that nodes that are neighbors in the logical space maybe far away in the physical space which will affect the latency and possibly bandwidth of the system. They have some heuristics to combat this like choosing locations for new nodes based upon latency to neighboring nodes, or picking a random location but this is effectively a trade off between uniformly partitioning the logical space and transfer time in the physical space.

My thoughts about CAN are that I think it is a decent idea but could be only effectively be applied in a relatively small physical network (i.e. One country, or less) because once neighboring nodes in the logical space are far apart or you have to make compromises on where you place new nodes the system looses its performance. Additionally, I would of liked to see the experiments performed on a real network rather than in a simulation.


Pastry: Scalable, decentralized object location and routing for larger-scale peer-to-peer systems, Rowstron, and Druschel

The core idea of this paper is to form a key value store named Pastry that is similar to Chord (a system studied earlier in in the course) because, like Chord, Pastry assigns a 128 bit nodeId to each node in the system and each node is responsible for handling and routing requests for numeric keys to the node with the closest nodeId. Additionally, the expected number of routing steps is O(log n). However, unlike Chord, Pastry keeps network locality in mind when routing requests so in each nodes routing table they keep track of the maximum number of hops between any two nodes.

My thoughts about Pastry are that I think it is at its core a mash up between Chord and a routing protocol with like distance vector because the idea to assign data to nodes based upon key values and nodeid is very similar to chord, and than the idea to route messages based on network locality the number of hops seems very similar to distance vector. Because of this I feel that you do get better performance over a system like Chord but at the cost of a more complex implementation.


Simon Krueger

From:	mdford2 SpamElide
Sent:	Thursday, February 10, 2011 11:49 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

Resilient Overlay Networks

A Resilient Overlay Network (RON) is an application-level overlay for wide-area, small-scale networks. Current internet routing protocols only store one routing entry per address, or partial address. When links are broken, BGP-4, the inter-Autonomous System routing protocol, can take up to minutes before converging to a consistent form. By maintaining multiple routing entries per node, a RON is able to re-route messages through other RON nodes and decrease downtime. In order to acomplish this, each node maintains link quality information. The standard values are latency, packet loss rate, and throughput. A routing path is then determined using these values to compute a metric. Included options are minimum latency, minimum loss, and maximum TCP throughput; or the user can provide their own metric. Extensions allow for policy-based routing and dynamic membership management.

A RON is intended to handle between 2 and 50 nodes. There could be more scalable options that approximate the routing metrics without guaranteeing their fulfillment. Also, the implementation described is intended for a rather static group, and will not handle churn well. If a node becomes isolated, it is considered a member until an entire hour has passed. Certainly in a mobile environment, where a node could drop without notice and be forced to re-join under a different IP, churn becomes an issue even if the number of total users does not change.


A Scalable Content-Addressable Network

Ratnasamy et al. define a Content-Addressable Network (CAN) as a distributed infrastructure for hash-table functionality. Their main focus is to completely decentralize file indexing. They use a logical coordinate structure to form an overlay mesh on nodes and determine neighbors. To locate a file, a node hashes the file name, converts the value to a location in the logical coordinate space, and forwards its request along the logical mesh in the direction of the target location. Once the request reaches its destination, the node can reply with information about the actual location of the file.

There are several optimizations that make this implementation scalable including multi-dimensional coordinate spaces, multiple overlays and overloading coordinate zones. Overloading coordinate zones is especially useful as nodes only need to communicate with one node in a neighboring zone. Thus, they can chose the node with the lowest latency to communicate with.

The paper also describes methods for bootstrapping new nodes and removing nodes, which is essential in peer-to-peer systems. However, there is no reshuffling method, so the overlay could become extremely unbalanced. It would have been interesting to see the effect of periodically unsubscribing and resubscribing as an effort to keep the coordinate space evenly populated.
From:	das17 SpamElide
Sent:	Thursday, February 10, 2011 11:35 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

1. A Scalable Content-Addressable Network

The paper introduces the concept of a Content-Addressable Network (CAN) as a scalable, fault tolerant hash table for distributed infrastructure. CAN was developed to address the scalability issue of prior peer-to-peer networks like Napster and Gnutella. Its design is based on a virtual d-dimensional Cartesian coordinate space. The total coordinate space is partitioned into multiple zones (regions) where each zone is assigned to particular node in the system. All key value pairs belonging to a zone is stored in the node responsible for that zone. For this, a uniform hash function is used to map keys to points belonging to different zones. In CAN each node maintains O(d) adjacent neighbors so that it can route requests to the desired destination. CAN uses a greedy forwarding algorithm to route messages towards the destination. For a d-dimensional space with n nodes the average path length was shown to be O(n^1/d). CAN also allows nodes to dynamically join and leave. When a nod!
 e !
joins the system CAN allocates its own zone in the co-ordinate space by splitting an existing zone. Similarly when a node leaves CAN merges two zones into a single zone and handovers the storage responsibility to the node occupying the newly merged zone.  In each case the neighbor list is updated to reflect the true scenario. The paper also investigates eight possible improvements on the basic design. The improvements include: increasing the dimension of coordinate space; maintaining multiple, independent coordinate spaces (Realities); including network latency in routing metric; overloading coordinate zones; using multiple hash functions; constructing CAN topologies that are congruent with the underlying IP topology; uniform portioning; using caching and replication techniques for "hot spot" management. 

Though the paper proposes various improvements, but the improvements themselves have tradeoffs. All evaluations have been done with ns (network simulator). It would have been nice if some real-life experiment were done. There is also no cost analysis of the repair algorithm associated with churn. Using RTT to determine topologically close nodes is not always true. Because RTT not only depends on geographic distance but also on the underlying network capacity/bandwidth. It may very well be true that two nodes have similar RTTs with respect to a given landmark but are geographically distant from each other. Moreover, the routing latency in CAN gets worse as the number of nodes increases, especially considering the fact that the hop in CAN network is application level virtual hop, not really IP hop.


2. Resilient Overlay Network

This paper presents an application-layer overlay architecture called Resilient Overlay Network (RON) which provides end-hosts and applications with the ability to recover from path failures and poor network performance very quickly. The main objective of RON is to improve the poor performance of the underlying Internet routing protocol called BGP and provide a higher level of flexibility to end-host in choosing the desired path to the destination. BGP achieves scalability at the cost of reduced fault-tolerance. As a result when a path /link failure occurs it requires a significant amount of time before it can converge to a consistent topology. RON tries to address this issue by creating a small overlay network among a subset of the available nodes each which uses an application-level protocol for communicating with the other nodes in the RON. In fact RON aggressively probes and monitors the links among RON nodes based on three metric: latency, packet loss and throughput. Thi!
 s !
enables RON to have a tighter integration with applications as it allows the application/user to decide which routing metric to use. To evaluate RON the authors performed two distinct experiments, one with 12 and the other with 16 node structures. They demonstrated that RON was able to recover path outage 60-100% of the times with recovery period of 18 seconds. All in all the paper argues that end-host controlled RONs provide a good platform for distributed applications to transmit data with greater flexibility and robustness.

However, there a several issues that need to discussed. RON sacrifices scalability for the sack of reliability, but it achieves reliability at the cost high network overhead. In fact, an N-node RON requires O(N^2) probes which is significantly high. For example a 50 node RON requires 33Kbps bandwidth for active probing which in some context could be really costly. So RON should be applied to a small set of applications rather than applying it to all possible applications.  Nevertheless, one observation is that if every application selfishly uses its own RON to improve its performance, the overall network performance might degrade significantly due to bandwidth consumption by the various RONs. Security and privacy becomes an issue when RON peers exchange route information since this could cause information leak among different ISPs against their policies. RON also provides sub-optimal routing in the case where both end hosts are behind NATs. This could become a problem as mos!
 t !
end hosts today are behind NATs.


---Anupam
From:	anjalis2006 SpamElide on behalf of Anjali Sridhar [sridhar3 SpamElide]
Sent:	Thursday, February 10, 2011 11:34 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

Resilient overlay networks, D. Andersen et al, SOSP 2001A 

Resilient Overlay Network is an overlay over the existing internet framework designed to provide the nodes that are a part of the RON with paths that recover from outages or degraded performance within several seconds in contrast to a few minutes. RON nodes maintain information about the state of the links between themselves and distribute this information among them. When the current internet path degrades in performance or there is an outage, the RON nodes route the packet over the overlay with forwarding carried out by intermediate nodes. The results of the experiments performed by RON1 with 32 nodes showed 100% recovery of nodes and RON2 with 16 nodes had a 60% recovery. The main goal of RON is to make sure that the application that is using RON is unaffected by network outages and experiences reduced latency by using alternate routing paths. 

RON is useful for applications that are delay sensitive like video conferencing. The average time to route a packet when there was a failure was 18s which is much lesser than a reroute using BGP. Moreover it was seen that most of the paths required only a single RON node in between to reroute the packet to the destination. One of the goals was the tight integration of the routing protocol with the application. The application can specify metrics that should be prioritized when routing its packets like throughput, loss rates, and delay jitter.

Though the results that are obtained by deploying RON provide an application with resilient network connections, there are some disadvantages that are associated with it. The RON networks are designed to be limited in size from 2 to 50 nodes. The scalability of application using RON is limited due to this. If a large number of nodes were to be used the traffic that would be encountered due to active probing might be high. The active probes sent out might cause some congestion as the number of nodes increases. Another issue might be that the nodes need to employ some security mechanism to prevent malicious attacks when forwarding packets. The trustworthiness of all RON nodes is assumed here which might not be accurate.


A scalable content addressable network, S. Ratnasamy et al, SIGCOMM 2001 

The paper presents a distributed hash table infrastructure called Content Addressable Network. The content addressable network maps the hash map of data onto a coordinate space and assigns nodes to the space as they join and leave a p2p network. Each node is assigned a portion of the hash map called a zone. Each node only keeps track of the nodes who have been assigned the adjacent zones. When looking for a particular key, the key is mapped to the coordinate space of the specified set of dimensions using a particular hash function. Routing is then carried out in a point to point fashion across the nodes. The analysis is then carried out by having multiple nodes assigned to each zone, having multiple coordinate spaces and having multiple dimensions for our coordinate space. 

CAN is presented as a distributed architecture with nodes having to store limited data about their neighbors O(d) and have limited path lengths among nodes O(dn^1/d) where d is the number of dimensions of the coordinate space. The joining and leaving of nodes requires the updating of tables of the neighbors in of that particular zone. The algorithms for nodes taking over a zone and routing around failures make this fault tolerant.  The CAN network can also improve routing by measuring the RTT to its neighbors and route accordingly. The nodes form an overlay and keeping track of the underlying network improves the time taken for routing messages to a node.

Though CAN provides a non-centralized infrastructure which is fault tolerant and can route around failures, the knowledge of the underlying network topology might better help in mapping nodes to the hash map zones. The nodes that are closer together are mapped to adjacent zones on the coordinate space providing a topological mapping of nodes onto the coordinate space. However this might result in some unequal distribution of nodes with a concentration of nodes in a particular zone. The overloading of coordinate zones provides improved fault tolerance since more than one node is responsible for a zone. However the data needs to be replicated across all peers. Moreover we know that some keys might index larger amounts of data than others. Hence it might be interesting to see the performance when multiple nodes have to reference such a zone. 

From:	Jason Croft [croft1 SpamElide]
Sent:	Thursday, February 10, 2011 10:55 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

Hi Professor Gupta,

Below is my review for the 2/10 papers on RON and CAN for CS525.

Thanks,
Jason Croft

----------------------------------------------------

RESILIENT OVERLAY NETWORKS

      Inter-domain routing in the Internet lacks strong fault-tolerance in order to achieve 
high scalability with BGP.  Andersen et. al's Resilient Overlay Network (RON) aims to 
achieve more robust and resilient routing by connecting a set of nodes and monitoring 
the paths between them.  By focusing on a small number of nodes comprising an 
"overlay network", scalability becomes less of a concern than in BGP, and tighter 
coupling with the performance requirements of the application can be achieved.  This 
is accomplished through monitoring and probing of paths between nodes in a RON to 
collect metrics on these paths.  It can then provide various knobs for choosing 
optimal paths for routing based on the metrics, e.g., to reduce latency, packet loss 
rate, or maximize throughput.  
      RON achieves noticeable performance and reliability increases over BGP, and 
can recover quickly from failures (quicker than BGP, as BGP is slow to converge 
after link failures).  With 12 nodes, a RON can recover from all outages, and with 16 
nodes, can recover from 60%.  In addition, recovery after failures requires less than 
30 seconds.  Loss rates are improved by 5%, and in a small number of samples, 
throughput doubles.  In this evaluation, the authors note their design uncovers a large 
number of Internet paths, therefore their design may prove useful in research in 
Internet measurement.
      The authors hint several times at a trade-off between scalability and fault-
tolerance.  BGP's goal is high scalability while RON's is on improved performance 
(lower loss rates and higher throughput) and robustness.  As such, RON is not highly 
scalable, especially in scenarios where there is a need for aggressive probing and 
monitor of paths between nodes.  More aggressive probing incurs higher bandwidth 
overhead, and as this overhead grows quadratically, can quickly become significantly 
limiting overhead as the number of nodes increases.  The authors' experiments do 
not test RONs with a large number of nodes (one with 12 and one with 16 nodes), but 
the monetary and performance costs of probing overhead may become limiting 
factors with 100 of more nodes.
      Additionally, the authors do not study the impact of RONs interfering with each 
other.  If multiple RONs are running on a network, each attempting to maximize 
performance based on their own application's requirements, overall network 
performance (and thus, each RON's performance) may degrade.  On a network 
without a RON, path selection of routes is controlled by BGP policies, and 
performance tuning and reliability is implemented higher in the stack than path 
selections (e.g., TCP).  By allowing RONs to control path selection, nodes may 
violate BGP routing policies, or adversely affect routing patterns, that ISPs have set 
up for optimal routing.  Furthermore, the authors only briefly mention communication 
issues with RON nodes if each node is behind a NAT.  In such a case, the nodes 
may not be able to communicate and affect optimal routing.  This is not necessarily a 
trivial issue, especially for users that may use RONs for multimedia conferencing, 
which the authors suggest as an example application that can you their design.


A SCALABLE CONTENT-ADDRESSABLE NETWORK

      Ratnasamy et al.'s paper presents a design for a distributed hash table (DHT) 
similar to Chord and Pastry, but instead uses zones on a d-torus to store keys and 
values.  Routing in their content-addressable network (CAN) is performed by 
maintaining a table of IP addresses and coordinate zones. Their design can scale to 
a large number of nodes (which they use for their simulations) and with large 
increases in the number of nodes, the path length grows slowly.  The authors predict 
a potential maximum of almost a billion nodes.
      The authors take into account several scenarios that may hinder performance 
such as geographical location of nearby nodes, uniform partitioning of the coordinate 
space, and hot spot management.  This was a strong point in their paper, however it 
required some compromises that introduced several weaknesses.  For example, to 
ensure that nearby nodes in the coordinate space are also geographically close, the 
coordinate space no longer becomes uniformly populated.
      Another possible weakness in the paper is the lack of "real world" evaluation.  
The authors achieve positive results in their simulation, but deploying this on the 
Internet for a service like Gnutella may give vastly different results for a number of 
reasons, including node heterogeneity and churn.  For example, for node 
heterogeneity, not all nodes posses the same amounts of storage (or may have 
vastly different read times for the stored data), or can have highly variable Internet 
connection speeds.  This could potentially be minimized since data is replicated, so if 
a node with a popular file has a slow connection, this would not affect other nodes 
since they could access it through another node (assuming the routing algorithm 
could take this into account).  Perhaps another option could be to leverage a method 
of speculatively re-executing lookups (e.g., such as in Zaharia's "Improving 
MapReduce in Heterogeneous Environments) using a different node.  For example, if 
node A requests a file from node B, but B has a slow connection, A could perform 
another lookup and download the file from a second node if the transfer time from B 
passes some timeout period.  Churn could be another problem, in that nodes 
frequently leaving or joining could add overhead in replicating data to these new 
nodes or transferring over data before they leave.  One method to reduce this, as well 
as possibly solve some issues related to heterogeneity, could be to assign some 
value to nodes based on metrics (like in RON: latency, throughput, etc.) and the 
amount of the uptime of the node within the DHT.  A node that has been part of the 
DHT for a long period that has low latency or high throughput could store more 
frequently accessed data.
From:	muntasir.raihan SpamElide on behalf of muntasir raihan rahman [mrahman2 SpamElide]
Sent:	Thursday, February 10, 2011 10:27 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

A Scalable Content Addressable Network

Summary: CAN provides hash table functionality for internet scale
systems. CAN partitions a d-dimensional space and assigns each node a
unique partition. Nodes sharing a (d-1)-dimensional hyperplane are
considered as neighbors. As a result the routing state is constant,
however the average routing (based on nearest neighbor search) path
length is proportional to d/4 * n^(1/d). The routes become smaller
with increasing d, but at the same time the state increases. A new
node joins the system by randomly choosing a point and splitting the
zone of the node that contains that random point. CAN also handles
node departure, and recovery from failures. The paper also describes
some design improvements including multiple higher dimensional spaces,
better routing metrics based on underlay latency measures, multiples
nodes per zone, and multiple hash functions. The proposed extensions
offer better routing and availability.

Pros and Cons: CAN offers scalability, fault-tolerance and autonomy.
However CAN is more of an abstract design paper, rather than a
concrete implementation of a DHT. In my opinion, both chord and pastry
can be thought of as instantiating CAN. Although put forward as a DHT
for the Internet, CAN has not been tested in any Internet scale
testbed (e.g. Planet-Lab) which makes it hard to believe its claims.
CAN routing ignores underlying latency, which can result in poor
routing performance in the real space. Although the authors argue that
the design extensions enhance performance, there is no concrete
evidence.

Future Works: The first requirement is a realistic evaluation on a
testbed like Planet-Lab to confirm the claims for Internet scale
performance. Some analytical modeling of the design parameter
trade-offs would also be insightful. The authors consider a regular
d-dimensional torus. It might be interesting to see whether there are
other higher dimensional metric spaces that could offer performance
benefits.

Pastry: Scalable, decentralized object location and routing for
large-scale peer-to-peer systems

Summary: Pastry is yet another solution in the design space of DHT's.
Like chord, pastry uses hash functions to generate random node-id's
that are distributed in a circular fashion. It guarantees logarithmic
routing by implementing a nearest neighbor search based on prefix
matching. A key feature of pastry is that it takes locality into
account when a new node joins the system. As a result, the possibility
of two nearby overlay nodes with very high distance in the underlay is
minimized.

Pros and Cons: It seems to me that pastry is easier to implement in
real world systems as compared to chord. This could be due to the
structure of the routing table. However this comes at the cost of
increased routing table space. This is also evident from the fact that
at the time of writing, pastry had already been used as a building
block for a number of applications including SCRIBE, PAST and so on.
Pastry takes locality into account, as opposed to chord. On the other
hand, chord has better stabilization against churn, whereas pastry has
no explicit stabilization protocol. Pastry cannot handle path failures
due to DoS attacks.

Comments: The choice of the simulation parameters in the paper have
not been justified. As an example, for b=4, the results indicate that
4-hop length paths are more likely than other paths. This seems like
an interesting observation, but the authors did not give any
explanation.

Future Works: It would be nice to see some theoretical proofs of the
performance guarantees presented in the paper. Also extensions of
pastry that can handle DoS attacks would be interesting.


 - Muntasir Raihan Rahman
From:	wzhou10 SpamElide
Sent:	Thursday, February 10, 2011 9:54 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

Professor Gupta,

Here are my reviews on Pastry and RON:

Review on �Pastry: Scalable, decentralized object location and routing for 
large-scale peer-to-peer systems�

Core idea:
The paper presents Pastry, a scalable, decentralized object location and 
routing substrate for large-scale p2p networks. Pastry is able to serve a 
variety of p2p applications in application layer on top of current IP structure 
of the Internet. The goal of the design is to minimize the distance messages 
travel. Specifically, every node in Pastry is assigned a unique ID, and keeps 
track of its logical neighbors (those with IDs numerically close to it�s ID) as 
well as its physical neighbors. Once a message with a key arrives, the 
receiving node forwards the message to the node with an ID numerically 
closest to the key. In this way, Pastry ensures O(logN) routing hops to any 
node in the overlay network in the absence of a large number of 
simultaneous node failures, where N is the number of Pastry nodes.

The points I appreciate about this paper are:
1.	Pastry takes into account network locality, which I think is the main 
contribution of this work compared with previous work in p2p system, for 
example, Chord.
2.	Scalability
Routing messages to any node only requires a logarithmic number of hops, 
which means the scheme provides good scalability, an essential property in 
large-scale networks.
3.	Pastry is robust to fault, nodes joining and departure, by making use of 
replication and decentralization.
However, I think there are some weaknesses, including:
1.	Some assumptions in the paper are a little vague. 
a.	When choosing the proximity metric, it is assumed triangular inequality 
always holds, which I�m afraid is not the case in the Internet. 
b.	Another limitation is that they only consider this one metric, assuming a 
node with a lower distance value is more desirable, while distance/latency is 
not always the most important factors of performance to applications.
c.	Also, they expected applications to provide a function to allow Pastry 
nodes determine distances among themselves, and traceroute is suggested 
as a possible tool. But ICMP filters and the fact that hop number is limited 
may cause a problem.
2.	In the experiments, nodes are uniformly distributed, but in real world, 
especially the Internet, they are not. So it will be more convincing if the 
authors can perform experiments in a more realistic environment/topology. 


Review on �Resilient Overlay Networks�
Motivated by the delayed convergence caused by traditional BGP, Resilient 
Overlay Network (RON) is proposed to allow distributed Internet applications 
to detect and recover from path outage within tens of seconds. RON is an 
application layer overlay network built on existing 
internet routing stack, like 
Pastry. An RON consists of a group of nodes deployed in different domains. 
Between every two RON nodes there�s an overlay link and routing information 
is exchanged among nodes. When an RON node gets packets from 
applications, it decides to forward the packet either on normal Internet path 
or through intermediate RON nodes, according to the routing metrics 
specified by applications. Also, whenever link failures are detected, packets 
are forwarded on alternative paths. 
First of all, I truly appreciate RON�s novelty, applying overlay network to solve 
the failure detection and recovery problem. Second, RON achieves better 
performance such as faster recovery, lower latency, lower loss rate, as well as 
fine-grained policy routing compared with traditional routing. Moreover, the 
authors implemented RON in the real Internet, which is much more 
convincing than simulations, and from the results of the experiment, we can 
see RON is able to route around most of significant failures in recover in less 
than 20 seconds. 
However, the good properties of RON described above is achieved on the 
cost of huge computation and storage on RON nodes. The reason RON can 
detect problems quicker is that RON nodes do active probing 
as well as 
passive observations together and that routing information is frequently 
exchanged among nodes. These probably cause great overhead, let alone the 
fact that there exist multiple routing metrics, such as latency, throughput, 
packet loss rate. This cost is not analyzed clearly and the complexity makes 
implementation hard.
Another weakness of this work, I think, is that it doesn�t explicitly give how 
much the distribution of RON nodes matters. Besides, the security issue isn�t 
been considered much. I wonder whether or not the exchange of information 
among RON nodes would cause information leak among different ASs, which 
is originally hidden when using BGP, for instance, when some RON nodes are 
compromised.

Thanks.

Wenxuan
From:	Agarwal, Rachit [rachit.ee SpamElide]
Sent:	Thursday, February 10, 2011 4:17 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

------
A Scalable Content-Addressable Network

The paper takes an initial stab on solving the scalability problems in peer-to-peer systems. They introduce the notion of "content-addressable" network, which has recently attracted a lot of attention in the context of Internet routing on flat labels, name- and location- independent routing, etc. The major contribution of the paper is to demonstrate the applicability of hash tables in designing scalable, distributed peer-to-peer networks. Adding to the basic construction, the authors present several additional architectural designs, that improve the performance of the system significantly.

In terms of the architecture, CAN successfully overcomes the shortcomings of the peer-to-peer networks in use at that point of time: its completely decentralized, and scales due to limited state maintained at the nodes in the overlay and no requirements of flooding the content requests. In terms of the routing scheme, CAN provides guarantees on routing state maintained at nodes in the overlay and on the stretch in terms of number of hops. The optimizations used over the basic schemes are interesting, in particular that on exploiting locality.

Some extensions/modifications could have made the paper significantly more interesting. 

(1) While it is interesting to have a bound on the "number of hops", what really matters in practice is to have a bound on the actual network layer latency -- a difficult task in general. 
(2) A more important design choice that CAN seem to have made is to tightly couple the replication of the objects within the architecture. In particular, a given node can not arbitrarily replicate the data at itself; instead, the nodes are forced to replicate the data. This seems so tightly coupled within the architecture that it seems quite restrictive in terms of possible optimizations. 
(3) The simulations show that the mean user perceived query latency is around 3 seconds. While this may be an artifact of the network topology in 2001, I believe this latency is at the very high end on expected performance from a system.
(4) Finally, the paper would have very much benefited with more simulations, more intuition on the interplay between various optimizations, a study on churn (which is frequent in peer-to-peer systems) and some discussion on how to handle network policies within this framework, if at all possible.

------ 
Resilient Overlay Networks

A RON architecture is presented that allows the application layer to detect the performance of a path and select the most desirable paths among the available ones. The main advantage of such an approach, in my opinion, is that an application can perform a path selection based on its requirements and perceived performance. The paper shows that having multiple paths at the disposal of the application can also significantly improve the failure resilience of the network.

The paper presents several interesting observations. A particularly interesting one is the sufficiency of having one intermediate overlay hop to overcome most of the failures. As mentioned above, the whole idea of having the applications choose the most suitable path based on perceived performance is very interesting.

However, moving the routing logic away from the network poses several implementation and security challenges. What happens if an application behaves maliciously? In terms of implementation, there is a fundamental trade-off: too over overhead in probing the network if fine-grained path selection is performed; the path performance may not be as expected if probing done at infrequent intervals. In general, the trade-off between performance, scalability and route maintenance/monitoring starts playing the role in such systems. 

I am curious about two aspects: (1) how does RON distinguish between failure induced outages (which may be long term) and congestion induced outages (which may be short term). Does it really help to distinguish between the two, especially when RON uses UDP? (2) How can I probe the network if I do not have a public IP-address? May be we can use the same ideas as discussed in class in the context of Gnutella.

------
--
Siebel Center for Computer Science,
University of Illinois at Urbana-Champaign,
Web: http://agarwa16.wikidot.com
----
Sing as if No one is Listening,
          Dance as if No one is Watching,
Dream as if You'll Live Forever,
          Live as if You'll Die Today !!


From:	Andrew Harris [harris78 SpamElide]
Sent:	Thursday, February 10, 2011 4:06 AM
To:	Gupta, Indranil
Subject:	525 review 02/10

Review of �A Scalable Content-Addressable Network� and �Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems�

The first paper, from Ratnasamy et al in 2001, describes the use of a �content-addressable network�, or CAN, in peer-to-peer communication. Specifically, the group introduces a particular type of distributed hashing system, wherein existing clients become responsible for increasingly smaller pieces of the overall hash table as new clients connect. Clients are mapped via a bootstrap node to a logical topology corresponding to the hash table in use, which is independent from their actual network topology. New clients� neighbors split their share of the hash table in two, and the new client takes one of these halves.

As an infinite number of possible paths between two clients exist in this model, the network topology is regularly polled to determine the best path at a given time. Dead clients are handled in a countdown manner among the dead client�s neighbors, with priority implicitly going to the node which first noticed the dead client (nodes proactively send heartbeat messages to their coordinate neighbors). The research group offers some preliminary improvements on their proposed CAN model, and shows empirically that message latency remains roughly proportional to overall network latency.

At first pass, this model does not seem to tolerate large-scale network failures. For instance, consider the scenario of a network partition where all bootstrap node(s) happens to fall outside of the partitioned area. Nodes outside this area begin to scramble to fill the hash table void, creating disproportionately large key spaces for some clients. Nodes within the partition no longer receive information on entering clients, and they lose the ability to reference content in key spaces held by nodes outside the partition. In other words, two networks are created: one that is dramatically unbalanced in its key distribution, and another that will slowly wither and die as its remaining nodes disconnect. The self-organizing nature of the CAN model relies upon accessing the bootstrap node(s); without this initial seeding, a CAN cannot function.

Pastry, from Rowstron and Druschel in 2001, is a similar self-organizing distributed network, though not a distributed hashing network as with CAN. Hashing is used in Pastry to perform quick logical topology routing and insertion of clients, though Pastry relies more on network topology in its organization. A new client connects to an existing Pastry node, and the existing node updates its set of neighbors (�leafs�). The new node also announces itself to all nodes on the path to the existing node, and the existing node announces the new node�s existence as needed. Nodes also have information on that hash set of their neighbors, to handle faults in the network; a node that detects a failure on one of its neighbors will automatically seek out the hashes from that node�s neighbors, and will propagate changes to its own neighbors as needed.

Pastry seems to suffer in handling a reverse issue from CAN: large-scale network joins. Since Pastry�s join scheme is entirely decentralized, and since Pastry is topologically sensitive, it would easily be the case that a massive influx of new clients would undermine the ability of each new client to find the most efficient route through the network. Not only would route traces be incomplete, as nodes would still be joining, the network itself would saturate with routing tests as each client (new and existing) struggles to find the best paths through the routing network. The result of such an event would be a network which varies wildly in client-to-client latency despite overall network latency. It may also lead to race conditions to entering nodes, which puts at a disadvantage the newer or existing nodes.

A solution to CAN�s problem of hidden centralization would be to allow for ad-hoc bootstrap nodes. Let each node have a list of bootstrap nodes and, in the event of a network partitioning or similarly catastrophic event, let each node have a chance at becoming a bootstrap node. Propagate this information through the existing network when updates are needed, and finally allow each node to accept incoming connections from new nodes (if only to query the location of existing bootstrap nodes). This would allow for a fully decentralized bootstrapping, and would mitigate the effects of a partition or similar event.

On the other hand, a solution to Pastry�s problem of information propagation, update the information about new clients in a lazy fashion. That is, do not immediately propagate the information about the existence of a new client, and use its connected node as a surrogate for passing queries. Unfortunately, this has the cost of high initial latency for new nodes, but latency will smooth out as the node recalculates its routes. This gives new nodes time to connect into the network, and could allow new client information to be cached and pushed as one bundle of changes versus being sequential updates and route calculations.


--------------------
Andrew Harris
PhD Student, Social Spaces Group
Dept. of Computer Science
University of Illinois at Urbana-Champaign
harris78 SpamElide


From:	lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide]
Sent:	Wednesday, February 09, 2011 10:26 PM
To:	indy SpamElide
Subject:	525 review 02/10


Dear Professor,


Here is my review for tomorrow:


CS 525 � Review: Overlays and DHTs


A scalable content addressable network, S. Ratnasamy et al, SIGCOMM 2001  

Content-Addressable Network (CAN) is an indexing service similar to Distributed Hash Table (DHT) in the sense that it provides main hash-table functionalities in a distributed manner. CAN also has three important and desired features ? scalability, fault-tolerance, and self-organization. Due to these features, CAN can potentially improve peer-to-peer systems by acting as an underlying substrate. The core idea of CAN is to partition a logical Cartesian d-dimensional space so that each node owns the sub-space (called �zone�), because CAN tries to partition the space into equal-size zones, the shortest path lengths is O(d <file:///C:/Users/lewis/AppData/Local/Temp/msohtmlclip1/01/clip_image002.gif> ) (application-level) hops. To ensure scalability, each node only stores immediate neighbors� information, which contains O(d) entries. To find the desired destination, sender uses hash function that hashes keys into a coordinate in d-dimensional space. With this coordinate, it is easy to route the message to the destination, the sender and the forwarder just use a greedy approach that sends to neighbors that has closest coordinates with the destination. Then the paper discussed how to handle node join, departure to maintain the overall CAN structure. In addition, the paper mentioned some design decisions that usually incur some trade-offs. For example, overloading zones (assigning single zone to multiple nodes) improve robustness and path latency, but system complexity increases, too. In the end, the paper backed up CAN�s performance by simulation. 

The system is rather simple and seems to be easy to implement. Moreover, though without in-depth and more theoretical discussion on CAN�s features, these features seem convincing. However, there are several areas can be improved. First, the choice of parameters, such as dimensionality and number of realities, deserves more discussion. Especially, it is interesting to see whether the combination of parameters fit a certain type of traffics or network better or whether there is a �best� set of parameters. Second, some approaches seem to be a little random and not intuitively clear, maybe the authors need to provide sounder basis for these approaches.  For example, when a node joins CAN, how does it randomly select an active node to send JOIN request? Since every node has only partial information of the network, would this �randomness� be really a uniform sampling? Otherwise, the claim about more uniform partitioning might be wrong. Third, network delay might worth more attentions. This paper mainly focused on the virtual hops based on d-dimension, and only discussed real network-level round-trip time (RTT) in a sub section. However, if RTT�s in the neighbor area differ a lot, then it is not clear whether the system can maintain the performance. Especially, large discrepancy in RTT might cause frequent inconsistent CAN states among nodes, which significantly reduces the performance, because nodes need to perform an expanding ring search to recover the information frequently. The other issue due to different RTT is that the takeover algorithm when node leaves or fails might result into not uniform partitions when a single node has a much quicker connection, and always take over neighbor�s zone.

There are also many other directions can be extended based on CAN. The first direction is security. The paper only considers benign faults, i.e., crash failures, in the system, so it is an interesting issue to examine how much performance would be downgraded with the existence of malicious node and to construct a more robust system if the downgrade is intolerable. Second, the efficiency in a real application deserve attentions, such as transmitting and retrieving larger file is very different from simply routing messages as discussed in the paper. For example, combining network coding and multiple hash functions mentioned in the paper might increase the efficiency and robustness at the same time, because now replica of file is distributed among neighbors, and a node can retrieve different files at the same time.

Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems, A. Rowstron et al, Middleware 2001.

Pastry acts as a routing and location infrastructure for other peer-to-peer applications. For example, PAST and SCRIBE use Pastry as overlay to provide file-storage and publish/subscribe service, respectively. However, unlike CAN, Pastry also takes proximity and network locality into account and it aims to minimize the distance represented by scalar proximity metrics, such as path length (real hops, not like virtual hops in CAN). Unless |L|/2 neighboring nodes crashes, the sender is always able to send message to the numerically closest node to a given key in less than  <file:///C:/Users/lewis/AppData/Local/Temp/msohtmlclip1/01/clip_image004.gif>  hops (|L| and b are configuration parameters, where b is typically 4 and |L| is typically  <file:///C:/Users/lewis/AppData/Local/Temp/msohtmlclip1/01/clip_image006.gif>  or  <file:///C:/Users/lewis/AppData/Local/Temp/msohtmlclip1/01/clip_image008.gif> ). To ensure locality and efficiency of routing, Pastry node has to store three tables: Leaf Set storing |L| numerically closest nodes; Routing Table storing (node ID) prefix-based routing entries; and Neighborhood Set storing |M| physically closest nodes (|L| is roughly the same size of |M|), and these tables are in size O(logN). The first two tables are used for routing by greedily forwarding the message to numerically closer node. Because the setup of node ID and Routing Table, (almost) every time the message is forwarded to the node with a longer prefix match with key and the size of remaining node with such match is reduced by a factor of  <file:///C:/Users/lewis/AppData/Local/Temp/msohtmlclip1/01/clip_image006.gif> , the path length is roughly O(logN). Neighborhood Set is used for transmitting heartbeat to prove presence and for exchanging and updating state information about tables. A mechanism for self-organizing node join and node departure is also discussed. Furthermore, the paper showed that when the triangular inequality holds in the proximity metrics, then this approach maintains the following property: entries in Routing Table contain only those physically near nodes. Therefore, number of hops in Pastry is closely related to the distance defined by the proximity metrics. In the end, the paper conducted bunch of simulations involved up to 100,000 Pastry nodes.

Though Pastry has been through intensive simulations, the results were not that convincing, because little theoretical analysis was mentioned and some of the key analysis was skipped. For example, the paper only provided high-level explanation about routing performance and did not present the proof. Especially, I would expect some more explanation for the one claim that is not so intuitive, such as the claim: the likelihood when appropriate prefix does not exist is fairly low. Moreover, the locality property and network proximity performance should also be backed up by theoretical analysis. The other major weakness is the usage of unrealistic assumptions. First, the performance seems to be based on relatively low churn rate, which is not true in most case. Second, the paper did not address on the performance when the triangular inequality is not hold in the proximity metrics. One last thing deserve more discussion is the choice of configuration parameters. It is not clear how the these parameters are chosen.

One possible extension is also about security. The authors briefly mentioned that randomization in selecting route can increase robustness, but this practice does not prevent from Denial of Service attack or other more delicate attacks. The other thing to improve is the quicker node join mechanism. In Pastry, every time a node joins, it needs to contact a physically close node and route a message to a numerical close node, which takes roughly O(logN) unit of time to construct its Routing Table. It is interesting to see whether storing extra information would expedite this process. For example, caching the node and neighbors� state might be helpful.


Thanks,

Lewis Tseng

From:	Qingxi Li [cs.qingxi.li SpamElide]
Sent:	Wednesday, February 09, 2011 9:47 PM
To:	Gupta, Indranil
Subject:	525 review 02/10

A Scalable Content-Addressable Network (CAN): 

This paper introduce a scalable peer-to-peer indexing mechanism as for peer-to-peer system, the hardest part is finding the peer which contents the files. CAN is completely distributed, scalable and nodes can route around failures. For the nodes in CAN, it stores a chunk of <key, value> records (called zone) and a small number of adjacent zones. Basic design is when some node wants to find which nodes contains file K, it uses the uniform hash function to map K to some point P and route to P to find the record <K, V> in which V is the information of K storing. For routing, CAN map nodes into d (determined by manager) dimensional coordinate and using greedy routing.  For new nodes joining, it first finds a CAN node and send JOIN message into CAN, the first node receives this message will split its zone into half for the new node. After that, both these node inform their neighbor their new zone.

The main problems of CAN are as follow: firstly, each node should also restore some information about its neighbors except the information they store for themselves.  Secondly, the nodes should periodically send messages to their neighbors to make sure they are alive. Besides these, for the routing algorithm, since the whole system is build on the overlay network, even in overlay network, they route in shortest path, however, in real network it may be far away. 

To solve the problems I mentioned before, CAN gives some little tricks but there are some tradeoffs to choose whether using them.

1.         Increasing the dimensions of the CAN coordinate. This will decrease the hops of the routing path and so does the latency.  The disadvantage is there will be more neighbors and should send more messages, store more information.

2.         Having multiple independent coordinate. This will decrease the hops of the routing, but not as efficient as <1>. Besides the problem 1 has, it will also store more zones for each node.

3.         Maintain the RTT matrix of neighbor and use it when apply the greedy routing. This can minimize the latency of each hop but it cannot guarantee the minimum RTT for whole path.

4.         Multiple nodes share one zone. This will reduce the path length, won�t increase the neighbors� information stored in one node and achieve the fault tolerance. But it will increase the zone information stores for each node and when the information changed, like deleted, multiple zone should be changed.

5.         Multiple hash functions, like 4 before.

6.         Topologically-sensitive construction. CAN has x well-known nodes and the whole coordinate space has been separated into x! space. For each node which wants to join in CAN, it gets the order of latency between x nodes and randomly choose one nodes in the space according to the order. This can put nodes which are physically close in same are in the overlay network.

7.         Uniform partitioning. When new node joins in, instead splitting the first node handle the JOIN message, choose the nodes with small load in the first node and its neighbor. Maybe affect by the hot point.

8.         Caching and replication: it will cause updating delay.

 
Resilient Overlay Networks (RON):

This architecture can detect the fault in the networks and find an alternate path in a short time. RON detects problems by aggressively probing and monitoring the paths connecting its nodes. And each node maintains the latency, lost rate, throughput, by default, matrices of all the other nodes by aggressively probing and data receiving. For the throughput, instead of finding the optimal throughput, it avoids the low throughput if there exists a better one. And the whole system routes on the overlay network based on these matrices by link-state routing protocol. 

RON, compared with BGP, can faster detect and recovery the fault in Internet since BGP hide many topological information between each ASes. Besides this, RON run in application layer can let the applications independently define which matrix to use when chooses the path by tagging the packets into different tags. For each node, it only calculates the routing for the first packet and the others will be routed as the first one. This will decrease the route time, but at the same time, as the information will change in Internet, especially for the lost rate and throughput, the routing may not be the optimal one for the following packets. Besides this, RON nodes need aggressively probing to all the other nodes in RON which will cause a lot of traffic. The other thing should be mentioned is that the whole architectures are running in application layer may cause more latency. But since the author argues that from their experiments, most nodes only need one RON intermediate node, this may not be problem. 

From:	iitb.ankit SpamElide on behalf of Ankit Singla [singla2 SpamElide]
Sent:	Wednesday, February 09, 2011 9:41 PM
To:	Gupta, Indranil
Subject:	525 review 02/10

Review 1: Resilient Overlay Networks (RON)
--------------------------------------------------------------------------

Summary
--------------
A RON is an overlay with the primary goal of using overlay
connectivity to side-step failures in underlying routing. RON nodes
probe the overlay to test connectivity of the network paths and can
choose routes through each other in case the direct Internet routes do
not offer good performance. Active probing and passive monitoring of
traffic are both used to evaluate available paths for metrics like
latency, loss-rate and available bandwidth. This information is made
available to applications running at RON nodes so they can pick the
paths that offer the best trade-offs for performance as suited to
them. RON also enables use of expressive policies for control of
overlay-usage by applications.

The paper discusses the design and evaluation of a RON spread over 16
nodes across the Internet. In the experiments described, one
intermediate overlay node was sufficient to route around most failures
in less than a minute.

Pros
-------
Simplicity of concept as well as implementation, making path
characteristics available to applications, and allowing policy-control
to be imposed on individual applications rather than just over an
overlay connection as a whole. Also, the evaluation is careful to
avoid use of the Internet2 connectivity in its demonstration of
routing around failures.

Cons
-------
How would multiple RONs interact � would they run into oscillatory
behavior? Also, limiting the RON to a few tens of hosts is a design
choice which makes it unsuitable for wide-area grid-like applications,
but this is a conscious design choice, so not really a �con�.

Not so much a con, but a question: Does this work remain relevant
today: has detection/resolution of failures in the Internet improved
since 2001? One also wonders if there are other applications to such
collaborative overlays: measuring inter-domain routing performance,
detecting security attacks etc. (I think there is some paper relying
on such distributed monitors to detect prefix hijacking.)

Review 2: A Scalable Content Addressable Network (CAN)
--------------------------------------------------------------------------

Summary
-------------
Given that content-centric networking is gaining traction again, this
is a very important paper. Knowing a few things about this area, I
think this might have been amongst the first to have articulated
clearly the advantages of such an approach together with a *scalable*
scheme for doing this.

The paper centers on mapping the entire coordinate space on a d-torus.
This coordinate space is partitioned dynamically and in a distributed
fashion between network nodes. Keys are assigned to nodes for storage
based on hashing them to the coordinate space. The deliberate choice
of a structured topology like a torus allows efficient routing (with
average path lengths growing as n^(1/d)) with very little routing
state (O(d)) per node. The rationale for the choice of a fixed �d�
instead of a log(n) connectivity like Chord is based on the difficulty
of maintaining Chord-like connectivity in face of the high expected
churn in the network.

Pros
------
The authors provide a neat summary of various design choices made
possible through their architecture and the trade-offs each exposes.
Even though the design is simple in hindsight (which is good), it
seems very carefully synthesized.

Cons
-----
Lack of support for mutable content and keyword search are key
limitations for a P2P network like CAN. Also, for CAN to be truly
content-addressable, it would need to not impose storing a key at a
designated node (like via a hash function), but to allow nodes to make
available any content they like.

Thanks,
Ankit
--------------------------------------------------------------------
Ankit Singla
Graduate Student, Computer Science
University of Illinois at Urbana-Champaign (UIUC)
http://www.cs.illinois.edu/homes/singla2/
From:	Tony Huang [tonyh1986 SpamElide]
Sent:	Wednesday, February 09, 2011 4:40 PM
To:	Gupta, Indranil
Subject:	525 review 02/10

Zu C Huang

SID: zuhuang1


Review for Resilient Overlay Networks


In this paper, the author proposes a system designed to provide a close group of related network user with more robust network performance, better integration with applications and more powerful and finer user control over routing policies. The core idea is to provide a set of user level protocol and service to programmers so that applications can control some of the network functionalities that are previously not controllable by them, including limited source routing, user-defined routing metrics, and better failure detection mechanism. By doing so, the system achieved performance improvement in some key area, including fault tolerance, throughput and latency. The nodes in the network form a complete graph. Each node continuously measures link condition with all its neighbors and broadcast those information to all other nodes, and each node maintain a performance database. When a node need to make a routing decision, it uses the local knowledge, the package type and the routing metric selected to choose a path, encode he path in the header which other nodes in the network would abide.


Pros:

1.	Provide user with source routing capability within the network.

2.	Active probing prevents routing using a bad link.

3.	Complete distributed design once a node joined the network. Each node makes the routing decision using local database. A lack of central server makes the network more fault tolerant.

Cons:

1.	Since the nodes in the network forms a clique, it is not scalable.


The RON is a useful service for applications looking for better performance within a close group of nodes in lack of underlying infrastructure support. One doubt I have about RON is that, the path selection policy RON uses is based on its measurement. However, since RON still uses the underlying physical network to transfer the packages, the package transfer still suffers from the underlying BGP outages mentioned in the paper. I wonder how tolerance the RON would be against a real-life BGP outage. Also, the interaction and dynamics between multiple RONs sharing set of nodes would be a practical issue if RONS is going to be widely deployed.


Review for A Scalable Content-Addressable Network


In this paper, the author proposes a system to provide a distributed hash table service. The design including a system to separate key space into all participating notes in the system, mechanisms to bootstrap new network, handle adding new nodes into the system, handle node departure and failure, and provide data replication and fault tolerance. The way CAN separates the key space is by mapping the nodes into an imaginary coordinate system. Each time a node joins the network, an existing key-space would be chosen according to heuristic and be split in a deterministic fashion. The new node and original node would each be responsible for one of the newly created key-space. When a node leaves the network, its space is merged with a nearby space to form a bigger key-space. Routing inside the network is done in a greedy fashion, in which a node would forward the package to a neighbor closer to the destination in what would be a straight line to the destination in the imaginary coordinate system. In order to cope with possible dense key-space, each space in CAN can be served by multiple nodes. Mechanisms, including multiple hash function and multiple coordinate space, allow data to be replicated across network.


Pros:

1.	Allowing multiple nodes to serve a same key-space.

2.	Automatic data replication across networks.

3.	Multiple hash function can help disperse data across different part of the network.

Cons:

1.	Lacking correlation between key-space assignment and node geographic location. Therefore, under rare circumstances, outages on a geographic location may cause data to be unavailable.

2.	The take-over process relie heaily on the ordering of take-over messages and timer. I would like to see more analysis of the transition behavior for the network.

3.	The numbering scheme is mathematically rigirous but unintuitive, which may cause real life implementation and debugging issue.


-- 
Regards
-- Tony

From:	Nipun Sehrawat [sehrawa2 SpamElide]
Sent:	Tuesday, February 08, 2011 12:05 PM
To:	Gupta, Indranil
Subject:	525 review 02/10

Pastry

The core idea of this paper is to obtain an O(logn) hop routing in an
overlay network of geographically distributed nodes, while trying to
minimize the �physical� length of the path, using a greedy approach.
Pastry does so by maintaining a routing table having O(logn) entries
at each participating node. This routing table converges the incoming
query to the destination node (node with nodeID closet to queryID), as
the next hop selected for the query has atleast one bit longer common
prefix, with the query, than the current node. Of the multiple
possible routing entries that share a fixed length prefix with the
current node, the node with closest physical proximity is selected to
be stored. In addition, each node maintains a set of nodes that are
near to it according to the overlay IDs (leaf set) and a set of nodes
that are physically close to it (neighbor set). The leaf set is useful
in the last stages of routing, to send the query to its numerically
close nodeID. The neighbor set is useful in achieving the
physical-proximity-routing goal of Pastry. Pastry deals with node
arrivals elegantly and besides an O(logn) messages exchanged per
arrival, the information sent to the new node helps it set up the
routing tables adhering to the proximity criteria. To tackle malicious
nodes (possibly dropping incoming queries), the routing table is
modified so that instead of returning the physically closest matching
node, it returns a random node of all the satisfying nodes, if the
physically closest node misbehaves.

Pros:
- O(logn) routing steps, with O(logn) routing information maintained
at all nodes.
- Physical-proximity considered an important factor in routing.
- Node arrivals involve O(logn) message exchanges and the overlay
stabilizes soon. Initially, the new node ends up just in the leaf set
of a node, thus making it reachable. When this new node sends back
updates to the nodes that helped it construct routing table initially,
the new node starts appearing in the routing tables, depending on its
physical distance from those nodes.
- Misbehaving nodes are contained by randomized selection from
matching nodes. This can also be used to load-balance the query
traffic among multiple nodes.
- In one of the experiments, Pastry style routing tables (O(logn)
entries) were able to be achieve physical paths with length only 1.3
times than in the case if perfect knowledge about topology was know
(O(n) routing entries).

Cons:
- If a longer prefix matching node is not found in the routing table,
all the nodes in RoutingTable U LeafNodes U NeighborNodes are searched
for appropriate next hop. Thus, routing can take up to O(logn) in
worst case.
- The routing takes a greedy approach in defining shortest physical
path. However, as noted in the paper, physical proximity might not be
transitive.
- The emulated network in experiments in not representative of a
typical internet topology.

Suggestions:
- Physical proximity can be replaced by other more relevant metrics
such as response time of a node, time taken by it to process and
forward queries etc. This can be achieved by sending probes queries
and gathering information regarding performance of nodes visited.


Resilient Overlay Networks

This paper presents the idea of forming an overlay network over nodes,
to help them achieve better QoS in communication among themselves. The
communication from one participating node to another can be routed
through multiple routes, due to high redundancy present in the
underlying Internet. However, such redundancy is not always visible to
the IP layer, due to the BGP protocol for inter-AS routing, which
presents information is a summarized form. In case of RON deployment,
the routing decision is not simply delegated to the IP layer. Instead,
each RON node actively maintains routing tables for determining best
possible paths to other RON nodes. Diverse metrics such as latency,
throughput and loss-rate are used to define one path better than
others. This gives flexibility to applications running on top on RON,
to decide best routing paths to other nodes, based on their specific
requirements. For eg., a file server would choose maximum throughput
and minimum loss-rate as two parameters for routing. Maintenance of
updated routing tables at RON nodes required O(n^2) active probes
periodically, to find out best possible paths between all the nodes.
In case of an outage on the path being followed by traditional IP
routing, it can take long time until a better path becomes visible to
IP. However, the updated routing tables can quickly provide a better
route to the destination, bypassing the outage link, thus making the
overlay resilient.

Pros:
- Improves QoS of communication between a set of nodes
- Exploits redundancy present in the Internet, to quickly change the
communication route between nodes, in case of outage or poor
performance on the existing path.
- Provides multiple metrics to define the weightage of paths.
Applications can choose from multiple metrics, according to their
need.
- In most of the cases, bypassing an outage/failure requires only an
additional single intermediate hop.

Cons:
- Scalability is restricted to fifty nodes. This seems primarily
because of the O(n^2) periodic probe messages and the size of routing
tables at each participating node.
- The paper briefly touches upon the concern of handling malicious RON
nodes, suggesting only administrative actions as a possible tool.

Thoughts/Suggestions:
- Applications can be allowed to define a combination of metrics can
be used to define the best route. This could take form of a weighted
sum of multiple metrics that are relevant to an application.
- Running a P2P network over RON might be interesting, where routing
decisions are taken by RON. However, the limited scalability of RON
will be bottleneck in the size of the P2P network on top of it.