From:	Chi-Yao Hong [cyhong1128 SpamElide]
Sent:	Tuesday, March 29, 2011 12:41 PM
To:	Gupta, Indranil
Subject:	525 review 3/29

---- Astrolabe: A robust and scalable technology for distributed
system monitoring, management, and data mining, ACM TOCS’03----
One of the major functions provided by distribution systems is data
monitoring. To monitor data from different nodes, it is common to
apply data aggregation to collect the data from multiple distributed
machines to a single coordinator. The most native way to achieve this
is to manually configure an aggregation tree, where each node in the
tree performs precompiled code. This way is simple, but not efficient
yet.  This paper targets on this problem with following four design
principles. First, the Astrolabe system constructs a tree based on its
physical network topology, i.e., hierarchical zones. This helps to
scale to large size networks. Second, Astrolabe adopts mobile code in
the form of SQL queries. This allows more flexible deployment such
that users could customize their aggregation function easily. Third,
Astrolabe adopts the gossiping protocol to provide high fault-tolerant
communications. Fourth, it uses PKI with digital signatures to make
sure the important messages are not been corrupted.
Pros:
1. With gossiping protocol, the latency is small. The expected number
of rounds necessary to infect all participants is approximately a log
function of the number of members. Again, the number of messages it
generated is logarithm to the number of members. It reveals that
Astrolabe could scale well.
Cons:
1. This paper failed to convince me why gossiping is a good way to do.
From evaluation perspective (Sec. 7.1 & 7.2), 30 rounds is equivalent
to 30 x 5 = 600 seconds (according to the simulation setup specified
in Sec 7.2). One could easily achieve better latency/load/fault
tolerance tradeoff by using other epidemic protocols. Some
deterministic techniques might help in this case.

---- Reliable On-Demand Management Operations for Large-scale
Distributed Applications, SIGOPS OSR’07----
This paper proposed a novel idea in which management overlay can be
formed in an on-demand fashion. Most of the management overlays today
are form in a persistent fashion. Therefore, the user has to pay for
the control overhead even when there is no one really uses the
service. For example, for a small scale network the user might not
frequently issue query commands. Therefore, maintaining an overlay
becomes a waste of resource during the low utilization time. This
paper proposed MON, which constructs an overlay for each query. In
particular, MON uses a locality-aware two-stage tree construction
algorithm, which provides high probability of coverage with a small
fanout (logarithmic to the network size). MON performed a
back-of-the-envelope analysis to show this on-demand approach reduced
the bandwidth consumption when the average query rate is larger than
40 seconds.
Pros:
1. On-demand overlay is a cool idea. I think this might have more
benefits in wireless network (as we have on-demand routing in wireless
but not in wired network).
Cons:
1. Section 3 does not provide enough motivations to use on-demand
approach. In particular, even with smaller bandwidth consumption,
on-demand approach has considerably large setup latency. Therefore, I
think it’s worthy to study in which scenario on-demand approach take
over the persistent approaches, i.e., the applicability of this paper
can be further analyzed.
-- 
Chi-Yao Hong
PhD Student, Computer Science
University of Illinois at Urbana-Champaign
http://hong78.projects.cs.illinois.edu/
From:	Andrew Harris [harris78 SpamElide]
Sent:	Tuesday, March 29, 2011 12:26 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

Review of “Epidemic Algorithms for Replicated Database Maintenance”, Demers et al, and “Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks”, Miller et al

The Demers paper, while a bit dated, seems to be among the seminal papers in distributed systems research. Coming out of Xerox PARC, the group tackles the problem of database replication across thousands of nodes in a heterogeneous, dynamic, unreliable network. Today, a connection could be drawn between this work and work in efficiently pushing updates across regionally distributed content delivery networks, or work in pushing control packets across wireless mesh networks (which is what happens in the Miller paper). The group at Xerox PARC takes the simple but inefficient and sometimes unreliable direct mail method of packet synchronization, and builds upon it with probabilistic techniques: anti-entropy, and rumor mongering.

Anti-entropy involves nodes randomly contacting their connected partners, comparing database checksums (or some other quick metric, though this isn’t a requirement in that nodes could compare their entire contents in some more naive implementations), and upon discovering a mismatch synchronizing the difference from the more recently updated server. The group notes that this will reliably converge with low overhead, but only does so after a considerably longer period than direct mail. Rumor mongering relies on nodes knowing whether they have “heard” about an update, and comes with several variants (push, pull, push-pull, etc.). A node with fresh information passes along their information to a neighbor who has not yet “heard” the information. This process continues until the infecting node has synchronized a certain number of neighbors, at which point the node goes dormant. This is faster than full database synchronization as it functions on synchronizing messages only, however the group notes thresholds for message passing that must be met for all nodes to have reliably received the update.

The group also presents some empirical data for propagation of information using knowledge of the geographic distribution of nodes. Their example uses a link cost metric, and they show the results of testing their algorithms over a network within the United States only, as compared to testing over a network including nodes in the US and Europe. Here, anti-entropy provides some benefits in minimizing comparison costs (traffic), while update costs were relatively similar. The group also tackles the problem of pushing deletion messages using death certificates, though this is an imperfect solution. Consider that, in a network configured with a lower transmission rate (and thus a higher chance of failing to fully propagate a message), a node that fails to receive a death certificate may continue to attempt to synchronize its data with neighbors. These neighbors would either have to continually reject the synchronization or delete off the offending data. An example of such a network now would be a mesh network, where the cost of transmission is prohibitively high (although this is unrealistic in practice, as wireless mesh networks do not usually deal with storage and deletion).

Regarding wireless mesh networks, the Miller paper describes the use of probabilistic message passing and listening for nodes in a mesh. The approach here is to set two parameters p and q, the former being a probability of immediate naive broadcast and the latter being a probability of a node listening for an extra cycle when it should be asleep. Power consumption for full message transfer is the main metric of success in this paper, and the group shows that they are able to scale power consumption overhead linearly to q. The group also presents a series of message transfer completion curves, where for some value p a suitable minimum q is given that guarantees a lower bound of nodes having received a certain message (and, conversely, for some q a minimum p is given). If figure 7 on the relationship between p and q is correct, q can be arbitrarily minimized, ensuring fast transmission of messages with minimal power consumption overhead compared to naive sleep cycles.

This observation is curious, though, in that it suggests a simple random rebroadcasting is an efficient message passing method, which it certainly shouldn’t be given the cost of transmitting a message. If q is minimized, this becomes a naive rebroadcasting method. With increasing q, the primary benefit realized is a shorter update propagation time, which the group points out (the energy-latency tradeoff). However, with increasing p, there seems to be an unknown power cost of heavy rebroadcasting across a network. It seems odd to me that this wasn’t discussed in further depth.

At any rate, though, the ability to tune a network’s transmissions is a boon to mesh network implementers, as the needs for meshes will certainly vary. It would be interesting to see the real-world robustness of this system, as the assumptions in the paper include an idea MAC and physical later with no collisions or interference (though, the group later notes that in their simulations they do take into account collisions and interference). For instance, a higher p and/or q could combat transmission problems in a network with high outside radio interference, such as a nuclear disaster site. The ability to tune the network in this powerful way appears understated, though it may simply have been outside the scope of the paper to consider such in depth.From:	Long Kai [longkai1 SpamElide]
Sent:	Tuesday, March 29, 2011 12:22 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

CS 525 Reviews 03/29
Long Kai (longkai1)
Epidemic Algorithms for Replicated Database Maintenance
Summary:
This paper examines several randomized algorithms for distributing
updates and driving the replicas toward consistency. Three methods are
examined, including direct mail, anti-entropy and rumor mongering.
Anti-entropy and rumor mongering are epidemic processes that require
few guarantees from the underlying communication system. They improve
performance both in achieving consistency and reducing network
overhead traffic.
Pros:
The algorithm itself is very straightforward and easy to implement.
Epidemic algorithms reduce overhead in the network and meanwhile
improve performance in achieving consistency.
It does not require special network topology.
Cons:
It is still possible that not all machines in the system receive the update.

Bimodal Multicast
Summary:
This paper presents a new multicast protocol that has throughput
stability guarantees. This reliability can be rigorously quantified.
It showed that a multicast protocol with bimodal delivery guarantees
can be built in many realistic network environments and that the
protocol is also scalable and gives stable throughput. The protocol is
composed of two sub protocols. The first is an unreliable,
hierarchical broadcast that makes a best-effort attempt to efficiently
deliver each message to its destinations. The second is a two-phase
anti-entropy protocol that operates in a series of unsynchronized
rounds.
Cons:
When the system scales, this protocol still faces the problem of
reliability and efficiency. How to guarantee both scalability and
reliability is still a problem.


-- 
Larry
From:	anjalis2006 SpamElide on behalf of Anjali Sridhar [sridhar3 SpamElide]
Sent:	Tuesday, March 29, 2011 12:16 PM
To:	Gupta, Indranil
Subject:	525 review 03/29


Epidemic algorithms for replicated database maintenance, A. Demers et al, PODC 1987.

 
The paper presents randomized algorithms that drive database updates to an eventual consistent state. The three methods that are used to distribute the updates to the various replicas are 1) Direct Mail : Each update is sent to all replicas by mail, 2) Anti-entropy : Each site randomly picks another site to exchange its database contents with, 3) Rumor Mongering: a site on seeing a new update picks another site at random and shares this new update. Anti-entropy and rumor mongering are both examples of epidemic processes.  Each site can be susceptible (has not received the update), infective (has seen the update and is willing to share it with the other replica sites) and removed (received an update but unwilling to share it).  The paper presents the three methods with extensions that might improve its efficiency. For example, in anti-entropy checksums can be exchanged before all the contents of the database are exchanged. There is also the further extension of a recent update list that can be exchanged before the checksum. Rumor mongering can be adapted by choosing the probability by which a site infects another site. A counter can also be used instead.  The deletion of old items needs to be consistent across all database replicas. The deleted items along with timestamps are propagated as updates to be deleted at all replicas.  The dormant death certificates are present if an old data value’s update is seen at a later point. 

The paper provides three methods for propagation of updates with various improvements.  It analyzes the behavior using simulations. 

In order for the snapshot algorithm to work for death certificates for deleted items, no site should be down. But this is not realistic to expect. It would be interesting to find out what update methods are currently used in the industry. Since the paper deals with simulations and a single topology, some real world implementations would be a good indication of the working of the various theoretical ideas and extensions mentioned in the paper. 

 
Exploring the energy-latency trade-off for broadcasts in energy-saving sensor networks, M. Miller, C. Sengul, I. Gupta, ICDCS 2005

 
The paper presents a broadcast protocol for wireless sensor networks. The protocol proposed aims to minimize the energy usage while maintaining performance. The Probability Based Broadcast Forwarding varies certain parameters to look into the tradeoff between energy and latency. This protocol can be integrated with any MAC layer with a sleep schedule that can used for this protocol. The aim of this protocol is to allow the application designer to have more control. There are two parameters that the application designer can modify.  The parameter p, this is the probability that a node rebroadcasts a packet immediately without checking if any of the neighbors are awake. The second parameter is q, this is the probability that a node stays awake at a particular instant when it is supposed to be asleep according to the sleep schedule.  Tweaking both these parameters helps the application obtain the energy latency tradeoff that it requires.  Parameter p provides a tradeoff between latency and reliability. As the number of rebroadcasts increases, the latency decreases but the number of nodes listening to it is actually dependent on q. The parameter q provides a tradeoff between reliability and energy. The more number of nodes that listen when they should be asleep increases the reliability but decreases the energy consumption. 

This paper provides a way for the application designer to have more control and hence decide what the best combination of p and q is when the sensors are being deployed in the field.  Since this can be integrated into the existing MAC layer, it provides an easy way of augmenting the present broadcast mechanism.  

When dealing with application that requires aggregation like TAG, all nodes might not have the same amount of power remaining. I am not sure how the parameter of p and q can be used then.  It is also seen that the update latency decreases as the density increases. This high density might be momentary, and nodes might be scattered again depending on the application this is being used for. What kind of adaptive mechanism can be used then? How often can the value of p and q be changed?

 
From:	trowerm SpamElide on behalf of Matt Trower [mtrower2 SpamElide]
Sent:	Tuesday, March 29, 2011 12:09 PM
To:	indy SpamElide
Subject:	525 review 03/29

Exploring the Energy-Latency Trade-Off

This paper focuses on the problem of broadcast in a wireless medium, where nodes energy expenditure is directly related to their lifetimes. Previous work in the area focused on two different strategies, using a coordinated sleep schedule and using a low power out of band channel. The authors proposed work focuses on the sleep schedule group of protocols because it has less stringent hardware requirements. PBBF is a layer built on top of other energy-aware MAC protocols such as S-MAC or 802.11PSM which allows the application to finely control the trade-off between energy expenditure and latency.

This work is a nice generalization of a problem which WSN have focused on. Like other probabalistic works there seems to be nice average case performance but no guarantees for latency bounds. There also seems to be a strong argument against doing broadcast operations in a low-power wireless medium. A nicer approach might be to try for the majority of nodes.


Epidemic Algorithms

Demers et al.'s paper from 1987 gives a well defined look at the field of gossip protocols from the past. Their paper focuses on three different strategies for propagating updates through a system, all of which are based in the ideas of epidemiology. The three models they looked at are direct mail, anti-entropy, and rumor mongering. The ideas are as follows: direct mail receives an update and then sends it to all of its neighbors, anti-entropy randomly chooses a neighbor and compares updates with it, and rumor mongering shares recent updates more often than older updates. Real world implementations and motivations for these ideas are discussed throughout the paper. Furthermore, the authors discuss implementation issues from all of these protocols such as using an unreliable transmission medium and spatial locality of the machines.

My main critique of the paper is that all propagation models discussed are directly taken from epidemiology and none take advantage of the fact that how updates propagate can be controlled at a very fine grain level. The paper does present a very nice model to describe the system within which is most likely why it has remained a classic.

From:	kevin larson [kevinlarson1 SpamElide]
Sent:	Tuesday, March 29, 2011 12:08 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

    Miller et al explore the trade-offs between energy and broadcast
latency in sensor networks.  They present their new protocol,
Probability-Based Broacast Forwarding (PBBF), which is used in
conjunction with a sleep scheduling protocol.  It is based aroud the
use of two parameters, p and q.  The p parameter is the probability
that a node rebroadcasts a packet in the current active time, and q
represents the probability that a node remains active during the
period it normally would sleep.  The evaluate PBBF with both
theoretical methods (percolation) and simulations with a variety of
values for p and q.  They obtain comparable values to other protocols,
but additionally can provide considerable benefits towards either
power or latency.
    The paper was very thorough on evaluation, and demonstrated a wide
variety of relationships between other protocols and PBBF in respect
to p and q.  It was also interesting to see the combination of
percolation theory and simulation both significantly evaluated.  Many
authors lean heavily towards one form of evaluation at the expense of
the other.
    It was a bit unfortunate to not see the project evaluated in a non
ideal environment.  I have trouble fully accepting PBBF without seeing
something to show the effects of collisions and interference.  Many
papers have shown without careful design decisions, these factors can
destroy scalability, especially in dense clusters (where PBBF seemed
to perform unreasonable well).
    Demers et all present the use of epidemic algorithms for
maintaining consistency between databases.  They explore a variety of
strategies using combinations of direct mail (broadcast updates to all
other sites), anti-entropy (random contacts exchange databases), and
rumor mongering (random contact sent updates).  They also explore
complexities such as blind vs feedback and counter vs coin interest
loss.  They evaluate their methods considering three main factors,
residue (the remaining succeptable individuals when infetive
individuals reach zero), traffic (in terms of total update traffic
divided by number of sites), and delay (in terms of average update
times, as well as the last site to be updated).  Evaluation showed  a
variety of results, which had various balances of residue and traffic,
usually with pretty consistent delays.
    The motivations for their were strongly built and led right to
their evaluation.  The corner cases that come to mind were addressed
(death certificates).  It was also nice to see the success in that it
was being implemented in an actual environment, showing real
improvements.
From:	harshitha.menon SpamElide on behalf of Harshitha Menon [gplkrsh2 SpamElide]
Sent:	Tuesday, March 29, 2011 12:05 PM
To:	Gupta, Indranil
Subject:	525 review 03/29


Epidemic algorithm for Replicated Database management

When a database is replicated at many sites, maintaining mutual consistency when updates happen is a problem. Important aspects that need to be considered is the time required for an update to propagate to all sites and network traffic generated as a result. They have described various randomized algorithms. There are three main strategies
-Direct mail: Each new update is immediately mailed from the entry site to all other sites
-Anti-entropy: Every site regularly chooses another site at random and resolves differences between the two.
-Rumor mongering: While a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has new updates.

Pros:
-Rumor mongering is efficient interms of network delay and traffic generated.
-Simple and reliable
-To reduce the network traffic and delay, they have considered spatial distribution and optimizations based on that.

Cons and Suggestions:
-This is an eventual consistency model and in some cases, that wont be acceptable
-The push/pull fractions can be varied depending on the condition of the node and network.
-They could use merkel trees to identify the changed portion of the database and exchange only that information.
-The updation can be based on the underlying topology but repairs would incur overhead. Routing sheme based on topology would reduce delay and n/w traffic
-Instead of totally randomized, we could have team based update propagation where we update all the nodes in each team when anyone gets infected and randomly picks a team to talk to and update. 

Exploring energy-latency tradeoff for broadcast in energy saving sensor networks

This paper presents a new protocol Probability Based Broadcast Forwarding to minimize resource usage and optimize performance. It exploits redundancy in broadcast communication and forwards packets using probability-based approach. The protocol has two parameters, one which is the probability that a node broadcasts and the other is probability that a node remains on. Through these two parameters PBBF provides trade-off between energy, latency and reliability. 

Pros:
-Here we can tune the parameters p and q which gives a good control over deciding a scheme based on trade-off
-Results indicate that this is an efficient broadcast mechanism.
-The energy, efficiency tradeoff has been studied in detail and provides detailed information on how the protocl works.

Cons and suggestions:
-Interesting work can be done to make these parameters adaptive based on the past pattern.
From:	Curtis Wang [cwang89 SpamElide]
Sent:	Tuesday, March 29, 2011 12:00 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

Curtis Wang (wang505)
3/29/11

Epidemics


Epidemic Algorithms for Replicated Database Maintenance

The paper describes algorithms for maintaining mutual consistency
among database replicas. The three methods described are direct mail,
anti-entropy, and rumor mongering. Direct mail involves sending an
update message from the entry site to all other sites. Anti-entropy
involves checking each database with every other database for
differences. Rumor mongering uses a gossip-based approach to spread
the update.

Pros
- Simple, randomized algorithm with high probability of succeeding (if
correct parameters are chosen)

Cons
- Complex epidemics not guaranteed to result in consistency. Paper
suggests anti-entropy as a backup method

It’s interesting to read about their proposition of eventual
consistency in this paper, which is what Amazon opted to use for
Dynamo. Also, it would be interesting to read about how these
algorithms would apply to or have affected modern datacenters.


Exploring the Energy-Latency Trade-offs

The paper describes a new protocol for broadcasts in WSNs that works
at the MAC layer. It adds two new parameters, p and q, which determine
the probability that a node rebroadcasts and the probability that a
node stays awake to receive a broadcast, respectively. These
parameters can be tweaked to affect the energy and latency of the
system, which were found to be inversely related.

Pros
- Can be added to existing protocols by adding two parameters
- Allows the application programmer to choose the values of p and q
for their purposes
- Presents both theory and simulations

Cons
- Broadcast is only effective at threshold values of q, which may be
difficult to determine
From:	Anupam Das [anupam009 SpamElide]
Sent:	Tuesday, March 29, 2011 11:42 AM
To:	Gupta, Indranil
Subject:	525 review 03/29

i. Exploring the Energy-Latency Trade-off for Broadcasts in
Energy-Saving Sensor Networks

 This paper presents a Probability-Based Broadcast Forwarding (PBBF)
protocol which explores the energy-latency trade-off for broadcasts in
multi-hop WSNs. In the original IEEE 802.11 PSM a node needs to
broadcast an ATIM packet (awareness packet) before sending the real
packet to ensure that the packet is actually received by the
neighbors. So each time a node wants to send anything it has to wait
for an ATIM window, thus causing large latency. So even though IEEE
802.11 PSM saves energy it causes latency. PBBF tries to reduce this
latency (while conserving energy) by introducing a probabilistic
scheme to choose the probabilities when a node can send a packet
immediately in the current active window (despite the fact that some
of the neighbors might not be awake) and when a node can decide to be
remain active even though it should ideally have slept. These two
parameters are represented by p and q in the paper and these are to be
set according to the application demand based on the energy-latency
trade-off. The paper provides both analytical and experimental
evaluation of energy, latency and energy-latency trade-off. From both
the analytical and experimental evaluation the paper concludes that
energy increases linearly with the value of q (p has no impact) and
latency is inversely related to both q and p.

 Pros:

1.      The paper provides both analytical and experimental
evaluation. This is one of the strong points of this paper.

2.      PBBF provides application level flexibility to tune parameters
like p and q.

3.      PBBF may be easy to implement as we only need to twig few
parameters and timing windows. No additional protocol or hardware is
needed.

Cons:

1.      The paper makes an assumption of ideal and collision free
physical layer which is unrealistic in WSNs.

2.      They also assumed perfect synchronization in the network which
might not always be the case.

3.      The simulation is done on a grid-based network topology. How
would it perform in other topologies?

4.      How would the protocol perform as the environment changes (say
some of the neighbors become unavailable or new neighbors join the
network)? Maybe dynamic adjustment of p and q could help perform
better under this scenario.

 -----------------------------------------------------------------------------------------------------------------------------------------------

ii. Epidemic Algorithms for Replicated Database Maintenance

 This paper concentrates on the problem of ensuring mutual consistency
among multiple replicas of a database.  The paper presents multiple
randomized algorithms (all analogous to epidemic) for distributing
updates to replica sites to maintain consistency. The paper mainly
highlights three types of algorithm namely- Direct mail, Anti-entropy
and Rumor mongering. In case of direct mail each site mails every new
update to all the other sites. So, this approach ensures timely update
but generates intensive amount of update traffic which can cause some
updates to be lost. Anti-entropy is a more relaxed approach where a
site randomly selects another site to resolve any differences between
them. It resolves the differences by tracking checksum. However, this
approach is really costly as the whole database needs to be compared.
In rumor mongering approach the sites are initially ignorant. When a
site receives an update for the first time it treats it as a “hot
rumor” and rebroadcasts it to some of its neighbors. But after knowing
that the update has been received by multiple other sites it stops
treating it as a hot rumor. Rumor mongering is efficient and
propagates updates relatively fast. However, there is a small
probability that not all the sites receive all the updates. The paper
also proposes various types of refinements to improve these
algorithms.

 Pros:

1.   The paper provides both analytical and experimental results.

2.   Using epidemic concept to ensure eventual database consistency
was novel at that time and is really an interesting idea.

3.   The algorithms are very easy to implement yet they are very effective.

4.   The algorithms were deployed in Xerox Clearinghouse .


Cons:

1.   These algorithms provide eventual consistency so they are limited
to a particular type of applications. For example these algorithms
could not be applied to databases requiring strong consistency such as
Banking.

2.   The paper does not discuss about how the topology of the network
impacts the performance of the algorithms.

3.   The paper assume that the sites are loosely synchronized (w.r.t
clock). But what would be impact if they were not synchronized at all.

----Anupam
From:	muntasir.raihan SpamElide on behalf of muntasir raihan rahman [mrahman2 SpamElide]
Sent:	Tuesday, March 29, 2011 10:53 AM
To:	Gupta, Indranil
Subject:	525 review 03/29

Epidemic Algorithms for Replicated Database Maintenance

Summary:
This paper discusses various epidemic algorithms for propagating
updates in distributed systems. Gossiping via epidemic is an effective
alternative to flooding, since it can reduce the number of redundant
transmissions while retaining universal coverage and minimal state.
Gossip protocols are usually probabilistic, local, lightweight, and
fault tolerant. Basically the authors first point out limitations of
direct mail and anti-entropy methods and subsequently show how complex
epidemics can overcome these problems. The direct mail method notifies
all sites when an update occurs. This can be unreliable during queue
overflows or unreachable destinations. The generated traffic is also
high. The anti-entropy method periodically exchanges content with a
random site. On the other hand complex epidemic algorithms like rumor
mongering differentiate between infectious sites that have new
updates, suspicious sites that are yet to be updated, and removed
sites that are infected but are not spreading. Epidemics are evaluated
based on residues, traffic and delay. The paper also discusses some
variations of rumor spreading epidemics, and combining rumor spreading
with anti-entropy in the background. It seems that anti-entropy works
better with spatial distributions compared to rumor spreading. The
authors also investigate the issue of deleted items via death
certificates, which makes things really messy.

Pros:
(1) The proposed algorithms require no guarantees from the underlying
communication infrastructure.
(2) The protocols are randomized, so they are usually easy to
implement using simple data structures.
(3) The algorithms eliminate state information from the servers.
(4) The mathematical theory of epidemiology can be readily applied to
analyze the performance of the algorithms.

Cons:
(1) Dealing with death certificates is messy, it introduces more
problems than it solves.
(2) The paper does not provide a definite answer as to which method is
better. However it does discuss the various trade-offs in detail.

Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving
Sensor Networks

Summary:
The paper presents PBBF (probability based broadcast framework), a new
protocol which explores the energy latency trade-off in sensor
networks via two probabilistic parameters. p is the probability that a
node rebroadcasts a packet immediately even without any active
neighbors. On the other hand, q is the probability that a node remains
awake during its sleep cycle. By tuning p and q, PBBF allows the
application to achieve the desired level of reliability and energy
efficiency. The protocol is analyzed mathematically by mapping it to a
bond percolation model. In the corresponding infinite graph, two nodes
are connected via an open edge with probability pq + 1-p. Then the
bond percolation thresh-hold is derived which specifies the edge
probabilities required for reliability. Concretely, PBBF has high
reliability when the probability that an edge is open is over the
critical probability. This is in contrast to site percolation models
used in ad-hoc network gossip protocols. The authors show that energy
consumption increases linearly with q, whereas latency is inversely
related to q, which means that in general, energy and latency are at
odds for PBBF.

Pros:
(1) The protocol is simple and tunable.
(2) It is elegantly analyzed using percolation models, a trait absent
in many systems research papers.
(3) The simulation results agree nicely with the theoretical analysis.
(4) Since PBBF is probabilistic, it is fault tolerant and lightweight.
(5) It can be added to the MAC layer without changing the MAC protocol.

Cons:
(1) PBBF assumes perfect synchronization, which may not be realistic.

Future Works:
(1) It would be interesting to see how sensor network algorithms like
trickle, synopsis diffusion perform on top of PBBF.
(2) A characterization of the workloads suitable for PBBF would be interesting.
(3) Dynamically tuning p and q to cater to application needs can be
investigated.
(4) Can similar probabilistic ideas used for in-network aggregation tasks?

-- Muntasir
From:	Shen LI [geminialex007 SpamElide]
Sent:	Tuesday, March 29, 2011 12:58 AM
To:	Gupta, Indranil
Subject:	525 review 03/29

Name: Shen Li

Epidemic Algorithm for Replicated Database Maintenance

This paper exams the benefits and costs of several different distributed database consistency maintenance method. They propose to view the updates propogation procedure in an epidemiology view, which allows them to benefit from existing mathematical theory of epidemiology. More specifically, their system contains three different roles: infective, susceptible and removed. The infective is the site which saw updates. Susceptibles are the sites which have not been informed about the updates. Removeds are the infective node who does not want to propagate the update it heard anymore. They provide several algorithm for an infective to decide whether it should flip to removed: 1) each infective has a constant probability p to become a removed when it perform any redundant contacts, 2)  each infective perform an constant number of contacts and then turn to removed. With above basic designs, they conduct several experiments to exam performances like traffic, convergence and so on.

Pros:

By introducing epidemiology theory, this paper provide a theoretical view of several random consistency maintenance algorithms.  They also show the effects of changing some parameters, which can be useful for real system designers. 

Cons:

1. Their method share a same drawback with rumour mongering techniques, which is, they can not guarantee all sites will receive the updates. Although they show that number of suspectives will at first reduce dramatically with the decrease of "flip to removed" probability (k). According to the characteristic of exponential functions, we will need relatively large k to reduce the number of suspectives to an ignorable value.


Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks

This paper proposes Probability-Based Broadcast Forwarding (PBBF) to explore the energy-latency trade-off in sleep scheduling mechanisms. In PBBF, there are two tunable parameters: p and q. p denotes the the probability that a node rebroadcasts a packet in the current active time regardless of whether it neighbours are awake or sleeping. q represents the probability that a node remains on after the original active time when it should sleep according to the scheduling algorithms. The paper then presents various simulation results of how p and q affect different performance metrics.

Pros:

1. The paper introduces randomness to sleep scheduling mechanisms by adding two probability parameters. In this way, people can tune existing algorithms to satisfy different specific applications.

Con:

1. As to me, it is little bit wired to see a hybrid algorithm that combines both deterministic and random parts. PBBF actually changes the sleep and wake up strategies of hosting scheduling algorithm. In this case, will the performance guarantees and bound still hold for those existing algorithms?

2. For computer systems, turning on a machine usually calls for extra energy. I am not sure whether it is also the case in wireless sensor network cases. 

From:	lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide]
Sent:	Tuesday, March 29, 2011 12:37 AM
To:	indy SpamElide
Subject:	525 review 03/29

CS 525 - Review: 03/29 Epidemics

Epidemic algorithms for replicated database maintenance, A. Demers et al, PODC 1987

This paper targeted the mutual consistency problem for replicated database. Two methods, Directed Mail and Anti-entropy, had been used in Xerox Clearinghouse servers. Though these were easy to implement and had provided satisfying performance in conjunction, the overhead were too much. Therefore, this paper proposed several randomized gossip-based protocols to achieve efficient, scalable and reliable consistency maintenance. The paper called these protocols Rumor Mongering and they were based on the studies in epidemiology literature.

The first contribution of the paper is the rigorous analysis on Directed Mail and Anti-entropy through equations and simulation. One rather surprising result to me is that pull converges more quickly (thus more efficient) than push, and push-pull behaves essentially like pull. The second contribution is the study of Rumor Mongering protocols. In particular, the paper proposed and analyzed many parameters and tried to tune them to fit Xerox workload best. These parameters are Residue, Traffic, Delay, Blind vs. Feedback, Counter vs. Coin, Push vs. Pull, Minimization, Connection Limit, and Hunting. Finally, the study of spatial distribution of spreading rumors is also valuable, since it is different from what have been studied in epidemiology literature, and potentially many heuristic can be used under different situations.

Comments/questions/critiques:

I am quite curious about why push-pull would essentially behave like pull. Intuitively, that might be because pull dominates the performance. However, if nodes have rough information about how fresh the rumor is, i.e., some estimation on pi, “the probability of a node remaining susceptible after the ith cycle”, then node can just use push when pi is low and use pull when pi is high. Wouldn’t this scheme outperform simple pull strategy?

As for topology, I am wondering why the paper did not focus on finding a single “best” pair of topology and spatial distribution for Xerox servers, since their environment was quite under control and reliable, and they could simply pre-determined the topology. As for heterogeneity, not only hierarchical structure but also different spatial distribution can be exploited.

Exploring the energy-latency trade-off for broadcasts in energy-saving sensor networks, M. Miller, C. Sengul, I. Gupta, ICDCS 2005

This paper proposed a new MAC protocols called Probability-Based Broadcast Forwarding (PBBF) which considers energy-latency trade-off for broadcast problem under application-specific level of reliability in multi-hop wireless sensor networks (WSN). Moreover, in theory, PBBF can be integrated with any sleep scheduling protocol. Last but not the least, simulation and theory showed that PBBF also has the bimodal property as gossip-based protocols. Thus, users can easily tune the parameters of PBBF to fit the application requirements.

The paper had several contributions. First, the paper proposed a simple gossip-like protocol, which is based on two parameters, p and q. p is the probability that a node (in active mode) immediately rebroadcasts a packet no matter that not are all of its neighbors awake. q is the probability that a node (in sleep mode) is still awake. Thus, PBBF can run on top of any sleep scheduling protocol except that with probability p, a node will rebroadcast and with probability q, a node will remain active in sleep time regardless of the original protocol. Second, the paper had solid theoretical analysis. The analysis of reliability based on bond percolation model showed the threshold behavior, which means that user can tune p and q easily and expect high reliability. The analysis of trade-off showed the inverse relationship between energy and latency. Third, the paper conducted two experiments: fast Monte Carlo simulations of grid network topology based on idealized environment where no collisions or interference happens and ns2 simulations of PBBF on top of IEEE 802.11 PSM (with perfect clock synchronization). Both of simulations supported the theoretical analysis on the trade-off and reliability, which implied that PBBF is practical and efficient.

Comments/questions/critiques:

It seems that in broadcast problem, the reliability is about whether every node receives the message. This kinds of causes confusion between reliability (or robustness) against node failures. Since gossip-based protocol should work well under node failures, I am wondering how robust PBFF is. In grid network, since the number of neighbors is fixed, node failures might be devastating. However, in random network as in the ns2 simulation, node failure might not be that serious. In this case, if we assume some random behavior of node failures, is it identical to study the impact of node density (, which is conducted in the paper) or is it different? 

Adaptive and heuristic version of PBFF mentioned in future work sounds interesting. However, the example used might not be a great example (when a node overhears more nodes involved in communication, p could be increased), since in this case, the chance of collision also increases.

One thing wonders me is the following questions on optimal time frame and active time. Given a certain traffic behavior and node density, is there a “best” combination of time frame and active time for a sleep scheduling problem? If yes, how would PBBF affect that? 

From:	Agarwal, Rachit [rachit.ee SpamElide]
Sent:	Tuesday, March 29, 2011 12:03 AM
To:	Gupta, Indranil
Subject:	525 review 03/29

Rachit Agarwal

----
Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks

The paper studies a fundamental trade-off between energy, latency and reliability for broadcasts in multi-hop wireless networks. In order to do so, the authors propose a probability based broadcast forwarding protocol, with two knobs that allow the system designer to choose a desired operating point. My questions/thoughts on the paper:

1. I am wondering how power-saving active-sleep protocols work if they require synchronization? Is it easy to provide synchronization in sensor motes?

2. How does PBBF perform in presence of collisions and interference? Does the analysis take into account the channel contention into account? The paper claims that the general trends are shown to hold using simulations. I am wondering about cases when these trends would not hold? 

3. The paper considers a very specific topology. I am wondering how network topology would effect the results? What if there are more sources generating data at the same point of time?

4. Finally, the packet rate considered in the paper is very low. My understanding is that a lot of sensor network applications may generate bursty data. How does PBBF perform in such scenarios?

----
Epidemic Algorithms for Replicated Database Management

The paper studies the problem of maintaining consistency among several replicated databases. It proposes a bunch of randomized algorithms, with the intention of simplifying a distributed implementation while providing eventual consistency with high probability. The basic algorithms are loaded with various tools to handle object deletions, bounded connectivity, etc. Overall, the paper presents a thorough analysis of various randomized algorithms that can be used to maintain consistency. I liked the fact that despite the theoretical touch, the authors took several practical issues into account in presenting the algorithms. My ideas/thoughts on the paper:

1. The anti-entropy algorithm could be very inefficient for large databases. In particular, it requires exchange of the complete database in case of inconsistencies, which may result in high network traffic. It would have been interesting to explore an incremental mechanism to achieve this with low load.

2. From the perspective of applications today, these algorithms may be useful for data-center replicated data and backup data. However, these data-centers often have very specific topologies, and the operators have vastly more control over a data-center when compared to a communication network for instance. Can we exploit this knowledge to improve the performance of these algorithms?

3. I am wondering if there is a significant difference in the analysis/results if there are multiple updates at multiple replicas? What if there are multiple changes at a single replica within a short time interval?

----

--
Siebel Center for Computer Science,
University of Illinois at Urbana-Champaign,
Web: http://agarwa16.wikidot.com
----
Sing as if No one is Listening,
          Dance as if No one is Watching,
Dream as if You'll Live Forever,
          Live as if You'll Die Today !!


From:	Tengfei Mu [tengfeimu SpamElide]
Sent:	Monday, March 28, 2011 10:41 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

1. Epidemic algorithms for replicated database maintenance

The paper presents several algorithms that aims to maintain mutual consistency of updates in a distributed and replicated database. Specifically, the three algorithms are direct mail: send updates to all nodes, which is timely and efficient, but unreliable because it cannot handle missing messages; Anti-entropy: every node periodically chooses another node at random and resolves any differences in state and the algorithm stops when it encounters a certain number of positive responses, which is reliable, but slower than direct mail and uses more resources; rumor mongering: infected nodes periodically choose a node at random and spread the rumor, which is less reliable than anti-entropy, but uses fewer resources. This paper also discusses and analyzes possible improvements to these algorithms such as death certificates, dormant death certificates, spatial distributions, etc. 

Pros:

1. Although anti-entropy and rumors are slower, they could guarantee that the servers will finally reach consistent state.

2. The experiment performance on Xerox servers is impressive.

Cons:

1. The algorithms do not provide 100% guarantee that all nodes will have the updates.

 
2. Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks

The paper presents and analyzed a new class of Probability-Based Broadcast Forwarding (PBBF) protocol that works at MAC layer for multi-hop WSNs, they could be integrated into any sleep scheduling protocol. PBBF takes in p and q values, which are respectively the probabilities of a node rebroadcasting a message in current active time and a node staying awake in its sleep time interval. By tuning these values, a designer can facilitate the network system to meet the threshold value and achieve the most appropriate energy-latency-reliability trade-off of broadcast. After implemented and simulated in ns-2, the authors conclude that p represents the trade-off between latency and reliability and q is trade-off between energy and reliability.

 
Pro: 

1. The general idea of PBBF is simple but efficient.

2. The paper allows designers to manually adjust p and q values to a specific system to achieve the best trade-offs.

Con:

1. How the power-latency trade-off will be affected while increasing the number of node within the WSN?

 
From:	Nipun Sehrawat [sehrawa2 SpamElide]
Sent:	Monday, March 28, 2011 8:36 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

Epidemic Algorithms for Replicated Database Management

This paper looks into the case of maintaining consistency among different replicas of a single database, by applying results from epidemiology theory. The authors aim for achieving a strategy that:
1. Don't rely on guarantees of underlying network.
2. Sites make independent decisions
3. Eventual consistency suffices.

They consider three strategies of achieving consistency:
1. Direct Mail: A site receiving an update sends it to all other nodes. This approach relies on a node's knowledge about all other nodes in the system and reliable delivery of update messages.
2. Anti-Entropy: It follows a "simple epidemic" model where a node is either susceptible (has not yet received the update) or infected (has received an update and is willing to spread it). In this scheme, a site randomly selects another site to compare their contents - if they differ, both replicas coordinate to reach a consistent state. This takes O(logn), where n is the population, to spread an update to all sites, starting with a single node. There are two possibilities for exchanging updates:
a. Push model: Initiating site pushes its updates to the site being contacted.
b. Pull model: Initiating site pulls updates from site being contacted. This outperforms the push model.
Exchanging checksums and maintaining a list of recently updates replicas reduces the overhead of anti-entropy model.
3. Rumor spreading: It follows a "complex epidemic" model where sites can be in an additional state called removed (has received update but not willing to spread it). A site can transition from infective to removed on events such as by reducing its tendency to be infective by an amount, say k, when it tries to infect an already infected site. This is more optimal than anti-entropy, but can't guarantee that updates reach all the sites (the probability that all sites receive update increases with k). This can be countered by running anti-entropy infrequently.
Deletion from a replicated database is complex and is handled by treating deletion as an update message ("death certificate") specifying a data piece as dead.

Pros:
- Achieves, with a high probability, eventual consistency without centralized control or reliable network
- Death certificates convert a delete into an update.

Cons:
- Requires (perfect?) clock synchronization as database entries are compared according to their timestamps

Comments/Suggestions:
- Merkel trees could reduce the overhead in anti-entropy scheme.
- How will the epidemic approach fit-in with the quorum-based update protocols?

--
Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks

This paper investigates the trade-offs between energy (consumed by sensor nodes), latency (of communication) and reliability (of communication) in multi-hop WSNs using broadcasts frequently. A new protocol called PBBS is proposed and it can be integrated with any sleep scheduling protocol at MAC layer. PBBS provides more fine-grain control over the trade-off desired between energy, latency and reliability by introducing two probabilities in the sleep protocol:

p: Probability with which a node rebroadcasts a packet immediately, without ensuring that its are active (and can receive the update). p provides control over trade-off between latency and reliability. A high p value decreases latency, as nodes rebroadcast immediately, but decreases reliability (measured in terms of node receiving the update) as nodes broadcast irrespective of whether their neighbors are active or not.

q: Probability with which a node a node stays active during its time to sleep. q provides control over energy and reliability. A high value of q keeps nodes awake for more time, thus increasing energy consumption, but at the same time it increases reliability as nodes are active longer and probability of receiving a update increases.

An important distinction made in this paper is about site vs bond percolation model of updates. Gossip protocols, where nodes broadcast with some probability (energy-reliability trade-off) come under site-percolation model. Whereas, PBBS, where bonds (route between nodes) is open (communication can be done) with a probability.

Pros:
- PBBS Provides fine-grain control over energy, latency and reliability trade-offs
- Seems that it is easy to integrate into existing sleep protocols.
- Analytical as well as simulation results are presented.
From:	david.m.lundgren SpamElide on behalf of David Lundgren [lundgre4 SpamElide]
Sent:	Monday, March 28, 2011 6:43 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

Epidemic Algorithms for Replicated Database Maintenance:

Demers et al. describe three epidemic algorithms for maintaining
consistent replicas of a database across a network. The algorithms'
correctness depends simply on the eventual delivery of messages in the
system. The authors distinguish between non-epidemic (direct mailing),
simple epidemic (anti-entropy), and complex epidemics (rumor mongering).
Direct mailing sends a message from an updated node to all other nodes
in the network. Anti-entropy epidemic periodically chooses a node at
random and compares database contents; updates are then pushed or
pulled. In anti-entropy epidemics all nodes are either infected or
susceptible. In a rumor mongering epidemic all nodes are initially
susceptible; when a node is infected it infects other nodes for a
certain time and then is removed (being neither infectious nor
susceptible).  The problems of consistent key deletion, guaranteed
consistency, and more appropriate spatial distributions are discussed.
Epidemics vary along blind/feedback, counter/coin, and push/pull
mechanisms; and are evaluated based on residue, traffic, and delay.  The
algorithms are then evaluated on the Xerox Corporate Internet, where
rumor mongering is with a non-uniform distribution is shown to improve
average network traffic and critical link traffic.


Pros, Cons, Comments, and Questions:
- The biggest question left standing is how generalizable these
  maintenance algorithms are to other network topologies. The
  assumptions made by the authors narrow the scope of applicable
  networks. Interesting extensions of the paper would include an
  analysis of dynamic network topologies with various churn rates.

- Is it possible to remove the assumption of clock synchronization?
  Removing this assumption would be important in analyzing the
  effectiveness of such algorithms on power constrained networks (e.g.,
  sensor networks).

- I would have like to have seen a more detailed examination of the
  update time window tau, and how it should be chosen.

- The theoretical guidance in the design of the algorithm and the system
  implementation was rigorous and thorough.

- The use of dormant death certificates as "antibodies" and the
  experimentation with different spatial distributions was insightful.

-------------------------------------------------------------------------

Exploring the Energy-Latency Trade-off for Broadcasts in Energy-Saving
Sensor Networks

Miller et al. introduce Probability-Based Broadcast Forwarding (PBBF), a
MAC layer protocol for broadcast in wireless sensor networks using any
sleep scheduling protocol.  Two "tuning knobs" are introduced: p, which
is the probability that a node broadcasts immediately after the receipt
of a message; and q, the probability of a node remaining awake (in hopes
of receiving an otherwise missed transmission) during its sleep cycle.
These "knobs" are inserted into the sleep-decision-handler and
receive-broadcasting section of any existing protocol's code. The
authors note a direct relationship between p and q or a given level of
reliability. The energy consumption overhead of PBBF grows linearly in q
and is not a function of p. Latency is inversely related to p and q.
With these two results, energy and latency are shown to have an inverse
relationship to each other. The algorithm is then tested as a code
distribution application using ns-2. Network density is shown to affect
latency, but to have little effect on the ratio of updates received to
total updates sent. All observations assume a grid network.

Pros, Cons, Comments, and Questions:

- The fact that PBBF is MAC-agnostic gives system designers control over
  their MAC choice (BMAC, SMAC, etc.) and allows for refined tuning of
  these protocols with a limited overhead.

- This "MAC-gnosticism" allows PBBF to sidestep the MAC related problems
  of Trickle, another code propagation algorithm that we discussed in
  class.

- By framing the problem as a bond percolation problem rather than a
  site percolation problem is an insightful solution that allows the
  authors to discuss Energy-Latency-Reliability trade offs inaccessible
  to site percolation problems.

- The paper might have benefited from using more than one MAC protocol
  in the empirical section as well as discussing different network
  topologies.

- The additions of the p and q "knobs" to any sleep scheduling protocol
  is trivial, resulting in the addition of small number of lines of
  code.
From:	wzhou10 SpamElide
Sent:	Monday, March 28, 2011 3:28 PM
To:	Gupta, Indranil
Subject:	CS525 Review 03/29

CS525 Review 03/29		Wenxuan Zhou

Review on “Epidemic Algorithms for Replicated Database Maintenance”

Core idea:
Considering the problem of maintaining mutual consistency in the database 
with multiple replicas, this paper discusses several epidemic-based 
randomized algorithms as solutions. The three basic epidemic algorithms are:
1.	Direct Mail: whenever there’s a new update, it’s immediately mailed 
from the entry site to all other sites.
2.	Anti-Entropy: A node sends entire database to a randomly chosen node.
3.	Rumor mongering: this is a gossip based protocol, a node sends hot 
rumor to random nodes and keeps interest of other nodes. The interest is 
decreased if the contacted nodes already know the rumor.
The first one isn’t reliable. The second one is reliable, but the overhead is 
too high both in terms of messages and time. The third one is probabilistic, 
but highly efficient. The last two ones are inspired by epidemic algorithms. 
There are 3 ways epidemic algorithms being used, based on how updates are 
conducted: push, pull, and push-pull. How an item is deleted using death 
certificates is discussed too. 
Also, spatial distribution is taken into account.

Pros:
1. It’s novel idea using epidemic algorithm to maintain database consistency. 
Randomized algorithms can avoid flooding the entire network, and thus be 
efficient both in time and number of messages exchange.
2. No underlying network is required.
3. Experiments have been conducted on relatively larger systems, like Xerox. 
The result looks convincing. 

Cons:
1. The topology considered in the paper is static, so maybe deterministic 
algorithms are more suitable in this situation. Also, if the network topology is 
dynamic, rumors might not able to be efficient.
2. The algorithm is probabilistic, so it cannot guarantee 100% updates.
3. Anti-entropy algorithm requires examine between new update and old 
one, so clock synchronization is in need, which is quite difficult in a 
distributed system.
4. It’s not clear to me how to decide which rumor is hot when several 
updates occurs simultaneously.


Review on “Exploring the Energy-latency Trade-off for Broadcasts in Energy-
Saving Sensor Networks”

Core idea:
This paper presents a new protocol Probability-Based Broadcast Forwarding 
(PBBF) to balance among energy, latency, and reliability for broadcast in 
multihop wireless sensor networks. PBBF uses parameter p to tradeoff latency 
and reliability, and q to tradeoff energy and reliability. Specifically:
Parameter p is the probability a node re-broadcasts a packet immediately 
regardless whether any of its neighbors are awake. Parameter q is the 
probability a node keeps awake when it should sleep. 

Pros
1. PBBF provides flexible interface to applications by adjusting p and q to 
achieve the best trade-offs. The idea is simple but effective.
2. PBBF works within the standard MAC protocol, and doesn’t require extra 
hardware. 

Cons
1. The values of p and q are tuned manually. It would be much nicer to make 
this dynamic.
2. No environment parameters are taken into account.
3. Evaluation is conducted only on simulator, but no real world experiments’ 
results are provided.
4. The relationship between p and q is not clear. They are treated as 
independent in the paper, which might not yield best result.


Thanks.

Wenxuan
From:	mark overholt [overholt.mark SpamElide]
Sent:	Monday, March 28, 2011 2:51 PM
To:	Gupta, Indranil
Subject:	525 review 03/29

Mark Overholt

CS525 Review 03/29/2011

Epidemic Algorithms for replicated database maintenance

Summary:

The paper deals with the problem of maintaining consistency among several replicated databases in which all of them are intended for the storage of the same data and any of them can initiate an update of a particular entry. It explores the use of simple epidemic algorithms for this purpose and the tradeoff they impose between the reliability with which an update is sent, the time for the propagation of the update through the entire database and the network traffic which results from such update propagation. In particular, they deal with both simple and complex epidemic algorithms: (a) simple epidemic or anti-entropy, in which every site regularly, randomly chooses another site and exchanges its database contents with it; (b) complex epidemic or rumor mongering, in which an update is introduced as a hot rumor (which spreads rapidly). A site that receives a hot rumor periodically chooses another site and sends it to and if it chooses several sites that have already seen the hot rumor, the rumor is no longer hot and its spreading stops. The paper further explores different aspects of these basic algorithms, including push, pull and push-pull variations. It also defines how death/dormant certificates can be used to propagate the updates of items being deleted from the system. It also deals with the practical constraint of bounded connectivity and tries to come up to schemes which take the spatial distribution into account.

Discussion:

Pros:

The paper presents strong mathematical representation of each problem and the proposed solutions which provide a sound theoretical background.

It takes into account practical issues like spatial distribution and comes up with a very generalized optimization for choosing probabilities for contacting various nodes that takes this into account.

Cons:

The paper though gives sound theoretical background; however, it lacks extensive experimentation of the ideas presented in the paper. It would have been good to see how these different variations perform with respect to each other.

This work assumes that backups are somewhat small and not to frequent, which is a limitation.

 
Bimodal Multicast

Summary:

This classic paper identifies some reliability issues in multi-cast protocols, discussing two different ends of the spectrum. One end has to do with atomicity guarantees. This is referred to as all-or-nothing delivery, delivery ordering, etc.   The other end relates to "best effort" reliability, where local repair strategies are used to overcome packet loss.  This paper introduces the idea of bimodal multi-cast, where the reliability model relates to bimodal probability distributions. They aim to show that bimodal multicast has stable performance as well  as qualities of   reliability and scalability.

 
Discussion:

Pros:

Rather than focusing on pure performance, this paper rightly takes the viewpoint  that performance  stability  must be a priority in multi-cast protocols.  

The paper discusses 6 different optimizations for pbcast, each of which can be applicable in a general sense to any network or infrastructure. This adds to the strength and applicability of the pbcast protocol, and suggests further optimizations that could be made as well.

 
Cons:

While they introduce the two ends of the spectrum very clearly, it's not clear how they can tune their design to find a good balance between the two. They suggest just one protocol as the common ground, but being able to tune the parameter space can make the paper much more applicable. While the experimentation evaluates the system using many different parameters, is there a way to use all these parameters together so as to tune it effectively?

They do discuss additional parameters such as fanout, but this is left as a theoretical discussion point in the appendix. Is fanout not an important part of the experimentation in section 7?  

The paper's experimentation is primarily based on NS2 simulations. Would these simulations apply to more real-world scenarios? Are the authors justified in making the claims about real applications as they do in section 8?

They also use fault-injection, which has been shown in previous papers that we have read, to not be as effective for distributed systems, and it's not convincing that these faults are representative of faults in real applications.

 
From:	Ankit Singla [iitb.ankit SpamElide]
Sent:	Monday, March 28, 2011 11:03 AM
To:	Gupta, Indranil
Subject:	525 review 03/29

1. Epidemic algorithms for replicated database maintenance
------------------------------------------------------------------------------
Summary: The paper discusses three kinds of information propagation
algorithms in a distributed system: a) direct mail, where the course
of new information sends it to everyone else; b) anti-entropy, where
each site regularly syncs content with another random site; c) gossip:
where sites gossip the new information between them randomly until
they figure out that the rumor has become old. Direct mail suffers
from the obvious issues of poor fault-tolerance and high load on the
source. To implement anti-entropy efficiently requires the use of
inverted indexes (time-stamp ordered database contents), which is
expensive. It turns out that the random gossip schemes perform very
well in terms of both network messaging and the time to convergence.
The analysis is actually based on fairly trivial differential
equations.

Comments: I liked their discussion on the variants of the gossip
schemes, for instance the idea that push-based epidemics suit more
quiescent systems while pull-based epidemics converge faster once the
infection has a critical mass. The deletion of data seems to be a
fairly hairy problem, due to some nodes trying to spread the
information again. Also, with the shift in the networking community
towards organized dissemination of information (via multicast trees,
DHTs etc.), I wonder how significant these algorithms are today and
whether they are applied commonly in practice.

2. Energy-Latency Trade-o? for Broadcasts in WSNs
-----------------------------------------------------------------------------
Summary: The paper tackles the energy-latency-reliability trade-off in
broadcasts in sensor networks by proposing a new MAC-layer protocol
PBBF (Probability-Based Broadcast Forwarding). Given that the same or
similar motes need to be applied to diverse applications, the
application designer should be allowed the flexibility to pick a
suitable resource-performance trade-off. This is what PBBF enables.
The paper starts out by showing that for a given level of reliability,
the energy required is inversely proportional to the latency seen. The
core idea in PBBF is to allow nodes to (probabilistically) broadcast
their data without waiting to make sure that other nodes are awake to
receive it; and, simultaneously, to keep nodes awake
(probabilistically) to receive such out-of-schedule messages. The
probabilities of these events represent trade-offs between latency and
reliability and energy and reliability respectively.

Comments: The idea of their approach being compatible with any sleep
scheduling protocol is a very powerful and useful one. Without placing
that consideration into perspective, any such approach would not be
very practical. Another interesting idea that the paper briefly
touches upon is that of having nodes adjust their probability
parameters based on the messages they see. This could have significant
benefits, but also seems to involve non-trivial aggregate behavior -
what would be the effect of nodes making these choices independently
on the system as a whole.

Cheers,
Ankit
--------------------------------------------------------------------
Ankit Singla
Graduate Student, Computer Science
University of Illinois at Urbana-Champaign (UIUC)
http://www.cs.illinois.edu/homes/singla2/
From:	Tony Huang [tonyh1986 SpamElide]
Sent:	Monday, March 28, 2011 3:56 AM
To:	Gupta, Indranil
Cc:	Tony Huang
Subject:	525 review 03/29

Paper: Epidemic algorithms for replicated database maintenance

Core Idea:
In this paper, the authors investigate different schemes of epidemic algorithms for database synchronization; they introduce various system evaluation parameters, different engineering design points on such a system; and they give solid theoratical proof of time bound on the algorithm. The algorithms described in detail in this paper is randomized anti-entropy algorithm and rumor mongering algorithm. In anti-entropy algorithm, each machine periodically select a neighbor and reconcile difference between the pair. They shows that, under this algorithm, the probability of a machine being un-synchronized would converge to 0, thus serves as an ideal back-up algorithm for other complex epidemics algorithm. The authors also introduce an idea similar to Merkel tree to save bandwidth and speed up the operation.
The authors then introduce complex epidemics algorithm. In such algorithm, the machine periodically broadcast the new message to a set of neighbors. It would stochastically stop the spreading if the number of neighbors which already saw the new message exceeds certain number. The authors show that complex epidemics would have a non-zero probability of failing. As a result, the author proposes backing up epidemic with an anti-entropy algorithm. The author also describes schemes to perform deletion and implementing the algorithm in an non-uniformed system. 

Pros:
* A randomized algorithm that is simple to implement and simple to analyze.
* Has strong theoretical analyze that the system would eventually converge to a uniform state and a theoretical bound on the convergence time.

Cons:
* It is well known that an actual network can fail randomly and suffer from multiple outages. The paper lacks the necessary actual experiment result to back up the theoretical analysis.

Comments:
* This algorithm is on the side of a completely decentralized design. We can introduce some hierarchy into the system to further improve its performance. This idea is explored in later papers.

Paper: Bimodal Multicast

Core Idea:
This paper introduce the idea of focusing on reliability in multicast system. There are two ends in the design spectrum. In one end, the protocol guarantees atomicity. That is, the delivery either succeed to arrive at all the nodes or none of the nodes and message ordering is guaranteed. On the other end of the spectrum, the system performs a best-effort only delivery attempt. Then the paper introduces the new protocol, the "probabilistic broadcast" protocol. It provides a probabilitic bound that approaches zero for failure or success, which the authors termed "almost all or almost none" guarantee. It guarantee stable throughput, ordering and detection of lost messages. The protocol consists of optimistic, unreliable and hierarchical broadcast that makes best-effort attempt. Then it runs a anti-entropy process in the background to reconcile discrepencies. A key difference in the pbroadcast protocol is that it guarantees a common prefix instead of a common suffix. That is, it guarantee the latest messages' receiving instead of guarantee the receving of older messages. Very old messages would be marked as lost if it is unable to be recovered. 

Pros:
* This paper introduce interesting idea along the design spectrum of multicast design spectrum, and it identify an important point on this spectrum that focus more on stability.
* The protocol combines two robust protocols, which has been throughly studied.

Cons:
* The protocol's emphasize of achieving a common suffix may not be practical for a log of applications. It sounds like a UDP-like multi-cast protocol that may be suitable for video streaming. However, for applications that require absolute ordering guarantee, it is not applicable. 

-- 
Regards
-- Tony