From:	Igor Svecs <isvecs SpamElide>
Sent:	Tuesday, April 12, 2011 12:31 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

Igors Svecs
CS 525
2011-04-12

Understanding availability

The authors argue that traditional notion of availability as a
fraction [0,1] does not apply to distributed systems any more. They
claim that node's availability may be in fact a function of several
arguments, such as time and availability of other nodes. This paper
studies Overnet – a file-sharing DHT-based peer-to-peer network that
was popular at the time the paper was published (2003). The
measurement was performed by crawlers that discovered node IDs, and
probers that periodically probed each discovered node to determine its
availability. The first finding of this paper is that using IP
addresses alone as host identifiers is not accurate, as a lot of hosts
(~40% in one day) have varying IP addresses most likely because of
DHCP. Second finding of this study is that host availability varies
with time-of-day which is rather intuitive. Finally, nodes are found
to be independent (in contrast to the streaming multimedia study).

COMMENTS

This paper overlooked some other factors that may affect reliability.
For example, it varies significantly depending on the application –
for example, we would likely see availability correlation in a
datacenter distributed storage system as opposed to Overnet. It also
surprised me that the ratio of unique hosts to IP addresses is 1:4 and
not the other way around, given that there is a shortage of IP
addresses and many hosts are hidden behind NATs. Perhaps this
situation was different in 2003 when the paper was published. As a
general criticism of the paper, I would argue that their results are
not significant as they could have studied more factors that affect
availability and/or suggest an analytical model.


Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming

SUMMARY

This measurement study examines PPLive multimedia peer-to-peer
network, and focuses on its overlay-specific characteristics. PPLive
is an IPTV protocol that enables users to subscribe to various
channels; first by contacting membership servers and then getting
neighbor lists from the peers. The authors argue that various
characteristics are different depending on the application used in the
overlay; for example, the structure of PPLive overlay graphs is close
to random, in contradiction to file-sharing overlays. They also show
that average node out-degree is independent of channel size; likely
because streaming media protocols do not need to maximize bandwidth
compared to file-sharing networks. Other findings include: Nodes in
the same snapshot have correlated availability, while random node
pairs have independent availability. This supports an argument that
specific application and users determine characteristics of an
overlay. They also confirmed intuitive idea that PPLive users are
impatient (smaller average session on a channel), and that channel
population varies widely over a day (due to live nature of served
content).

COMMENTS

This study makes a valuable contribution of examining streaming
multimedia-specific characteristics of overlays and comparing them to
other applications (such as file sharing), in particular that the
authors attempt to interpret measurement result. It is unclear why the
authors assumed that it is unknown how a node returns a list of its
neighbors – is it not specified in the protocol, or do we not trust
that implementations will follow specifications? Their snapshot
algorithm is not consistent, but it is probably impossible to ensure
consistency without control of the nodes. This paper also raised a
general question whether some properties of overlays (for example,
node degrees) can be studied analytically, or do they actually need to
be measured.
From:	w SpamElide on behalf of Will Dietz <wdietz2 SpamElide>
Sent:	Tuesday, April 12, 2011 12:29 PM
To:	Gupta, Indranil
Subject:	CS525 Review 4/12

Will Dietz
4-12-2011
cs525 -- Availability

"Understanding Availability"
Ranjita Bhagwan, et al, UCSD

This paper tackles the problem of defining "availability" of a
peer-to-peer system, arguing that existing measumrents and availablity
models are insufficient.  They argue that availablity shouldn't just
be a single metric, single value between [0,1), but rather that it
should be a composition of individual host availability, time-of-day,
and particularly host availablity interdependence.  One important
contribution is that they point out that existing measurements often
ignore the important effects of host aliasing (same host, different
IP) that lead to incorrect conclusions regarding degraded host
availability and system churn.  Additionally they present their
various traces from 7-14 day analysis, which mostly show what you'd
expect (but is still hugely useful to show/demonstrate).  Host
availability is a strange metric (as measured in percent hosts the
crawler has ever seen that are availbale) in that over a period of
time, due to things like nodes sometimes just permanently leaving and
the like, goes down as the time frame the measurement is conducted
over.  Time-of-day is interesting in that due to trends in
arrivals/leaving throughout the day, one might want to reconsider some
models that assume less transient node 'failure', as well as
highlights what might look like churn but perhaps isn't.  Additionally
they noted that the system had less hosts overtime (I'd imagine that
over a much longer period of time you'd expect it to be more or less
stable, if not growing), which has implications for file-storage
systems that might not deal well with decreasing host capacity (they
point out Oceanstore, but Pastry comes to mind as well).

"An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS"
Simson L. Garfinkel, Harvard

My first reaction was that it seemed strange for a Harvard professor
to be writing about the availablity of Amazon's services, and curious
about the business implications of studies such as this.  Would Amazon
be aware of the study?  Might they bias it somehow, and how would we
know?  Just a thought...

Anyway, this works investigates Amazon's various 'cloud' services such
as EC2, S3, and SQS and their respective availability.  The authors
point out that Amazon provides no SLA, but makes various claims
indicating high availability and suitability for various applications.
 This, of course, makes one wonder exactly how reliable their services
are, and how suitable they might be for your business use.  Frankly
tad more surprised such studies wren't more readily available--before
I used them for something in my business I'd imagine I'd want some
kinda of sense of what I was getting myself into.

They summarize their results by saying that while Amazon mostly lives
up to its claims, their systems suffer in strange and unpredictable
ways.  They also go into good detail analyzing the various aspects of
what one might use in part of determining the suitability or quality
of Amazon's offering with respect to your particular needs--from DNS
issues, to the failure policies (S3 says 'retry until it works!')

One strange note on their study is that their EC2 analysis (which,
btw, largely uses S3 for its storage, as I understand it...) was
during the EC2 beta.  They report great results, but you can imagine
that there are two biases here--during beta less users so might work
better for that reason, but also during beta it's still, well, *in
beta* and might have issues that they intend to resolve for later.
For that reason while I'm fond of the EC2 services personally, I'm not
sure how to take their results.

All in all this is a paper I will keep around and re-read closely if
I'm to use any of Amazon's services for something important in the
future.  Of course, one has to ask how wlell measurements like this
work across time (it's been at least 4 years, and that's a fairly long
time for such things).

~Will Dietz
From:	kevin larson <kevinlarson1 SpamElide>
Sent:	Tuesday, April 12, 2011 12:25 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

    Bhagwan et al set out to explore and understand the deeper nuances behind ‘availability’.  Design decisions based off of inaccurate or incorrect assumptions about failure and churn could impose significant overhead on a peer-to-peer system.  They chose to use the Overnet peer-to-peer file-sharing system, because of two factors, unique user IDs and because of its size and deployment.  Their implementation involved two components, a crawler and prober.  The crawler was used to get a snapshot of the current user IDs in the network.  The prober was responsible for probing a random subset of the users gathered by crawler (at a much more frequent rate).  They crawled and probed Overnet for 15 days, and measuring availability and modeling it in a variety of ways.  They demonstrated the variance over time of the host availability.  They also modeled daily patterns, the relations between host pairs, and the arrivals/departures of hosts.
    The authors not only motivated their decision to use user IDs over ip addresses, but actually analyzed the differences in their results had they instead used ip addresses.  They analyzed the effects of multiple users on single ips as well as single users on non-static ips.  They also were thorough in their other evaluation and demonstrate the effects these results could have on overhead.
    Although the evaluation was good, the time frame of only 15 days seemed weekly explained and motivated.  The authors wrote it of as short term results and left longer results to future work.  
    Vu et al measured and modeled the PPLive peer-to-peer streaming service.  They crawled PPLive channels and take snapshots of the peers in that channel, and traced traffic between PPLive nodes and servers.  They then modeled their data in a variety of ways, usually with graphs of user connections.  They demonstrated things such as node degree and showed that it was independant of channel size.  They also showed how increasing channel sizes make the user graphs increasingly less random and clustered.  They demonstrated differences betwee  peer-to-peer file systems and PPLive, showing availability correlations between pairs of users in channels.  PPLive was similar to other peer-to-peer systems when observing users outside of channels, in that availability was independent.  They also showed how channel sizes varied with times of day.      The authors used their results to demonstrate varying behaviors in peer-to-peer systems, and make a strong argument that most systems and models will not work for all peer-to-peer systems.  Due to the number of potential applications, they propose to make general systems which can be applied to the respective peer-to-peer systems.
    The authors wrote a particularly interesting conclusion session, which both clearly laid out the important findings, and also proposed how to apply them to systems and how to further extend their results with future work.
    
    
From:	Jason Croft <croft1 SpamElide>
Sent:	Tuesday, April 12, 2011 12:18 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

Understanding Availability

This paper examines the availability of hosts in a peer-to-peer storage system, 
arguing that availability in these systems is poorly understood compared to devices 
like disks.  The authors collect data over 15 days from the Overnet P2P file-sharing 
system, which uses immutable IDs.  They implement two components: a crawler 
that collects snapshots of the IDs of active hosts, and a prober to periodically check 
if hosts are available.  Over 15 days, 1468 unique hosts and 5867 unique IPs 
responded to the prober.  The authors believe IP address aliasing is responsible for 
this large difference, for example, through DHCP or NATs.  Host availability follows 
a diurnal pattern, and the difference between the maximum and minimum number of 
hosts is about 100.  The authors also observe high churn, with 6.4 joins and leaves 
per host per day.  New nodes (that is, new IDs) join at roughly the same rate nodes 
permanently leave, comprising about 20% of the system.  There is also significant 
independence between the availability of host pairs.

From the collected data, the authors discover some interesting observations.  For 
example, using IP addresses to identify hosts would underestimate availability by a 
factor of four, and would result in more replicas of files than required, thus wasting 
storage space.  Since host availability decreases over time, periodic file refreshes 
would be needed in the system.  Given the high rate of churn, certain systems that 
replicate objects as nodes leave or join would incur high overhead to transfer data 
between hosts.

A significant number of host IDs used multiple IP addresses--32% used five or more, 
and 12% used 10 or more.  I think this observation could have been studied more, 
though it may have been outside the scope of the paper.  An interesting comparison 
that the authors did not look into would have been the number of hosts that share the 
same IP address.  Also, as the authors note, IDs are not completely immutable--a 
node can easily change it by deleting a configuration file.  Thus, any clients that may 
have reinstalled Overnet or manually deleted the file would have slightly affected their 
statistics.


An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS

Utility computing is a large infrastructure built by an organization and rented out for 
computation, storage, and bandwidth on an as-needed basis, such as Amazon Web 
Services (AWS).  While examining Amazon's different services, the authors find that 
it mostly delivers on availability but suffers in performance.  AWS security uses 
credentials that can be downloaded from the dashboard, so anyone with access to 
the account can gain access to the resources linked with that account.  SSL is used 
to prevent man-in-the-middle attacks, but there is no guarantee of privacy of 
information stored in S3.  Therefore, tenants are advised to encrypt data, and should 
also verify integrity with HMAC or a digital signature.  AWS does not offer any type 
of snapshot or versioning of user data or backups, and redundancy is left to the 
client.  However, the Web Service License Agreement (WSLA) protects Amazon 
from tenants attempting to sue the provider if any damage arises out of using the 
services.

When using S3, the client is responsible for retrying failed requests since S3 is 
designed to quickly fail requests that encounter problems.  Clients must similarly 
check for write and read errors, though the authors do not experience such errors in 
their experiments.  There is no method of copying, renaming, or moving an object to 
a different bucket without incurring data transfer charges.  The authors' experiments 
show S3 performs better with larger transaction sizes (over 16MB), but little 
additional performance is gained for files beyond this size.  Performance is between 
10 and 50 transactions per second, with writes more likely to be slower than 10 TPS 
than reads.  

The lack of a Service Level Agreement for Amazon's services seems a bit odd 
considering the customers they target.  The authors discuss the WSLA, but only 
vaguely describe how this is different from a standard SLA.  As this paper is several 
years old it does not discuss the S3 outage in 2008, which caused Amazon to 
provide less than the 99.99% availability they claim for stored data.  Many of the 
points regarding S3 reinforce the need for a system similar to RACS (Redundant 
Array of Cloud Storage).  The authors mention that crashed EC2 instances continue 
to acquire charges until rebooted, which seems to be a significant design flaw.  This 
requires companies using EC2 to have an effective monitoring system of their 
instances in place to prevent additional charges from instances that may have 
crashed but were not rebooted.

From:	anjalis2006 SpamElide on behalf of Anjali Sridhar <sridhar3 SpamElide>
Sent:	Tuesday, April 12, 2011 12:17 PM
To:	Gupta, Indranil
Subject:	525 review - 4/12


Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007

The paper attempts to study the p2p streaming characteristics of the PPLive system. The paper presents the PPLive overlay characteristics and some models that explain the session length and the channel size. The authors present a more inclusive definition of active peers. Active peers are peers that appear either in the list received from the membership server or are present in the partner list of another active peer. The paper also focuses only on the out degree of nodes called the k response degree. The authors use a crawler which joins a PPLive channel and queries peers also viewing the channel. The metrics studied in this paper are 1)node degree overlay 2) overlay clustering coefficient 3) availability correlation among nodes in the overlay 4) overlay population size 5) node session length per overlay. These metrics are studied using the snapshot operation and partner discovery performed by the crawler.  The snapshot algorithm consists of querying an initially received peer list for partner lists and appending new peers onto the peer list. 
The paper is able to provide information about the metrics mentioned above. PPLive overlays are random graphs when there are a small number of nodes else they are highly clustered.  While some peer pairs have a high correlation others have none. The session lengths and channel population size are modeled after geometric distribution and polynomial mathematical models respectively. Unlike P2P file sharing users, PPLive users are impatient with start-up time of the streaming service. By differentiating between P2P file sharing systems and the P2P internet streaming services, the paper is able to provide a starting point in improving current streaming protocols. 
Correlation among peers can be used to decrease the start up time of the channel. If we know of the top k peers who might be up at a point in time, we can query them instead of the membership server.  The fact that the average node degree remains constant is explained from the view of a node gaining partners. However is there a limit when a partner request is got by a node? The new nodes that want to view the channel might increase the out degree of the already present nodes. More explanation by the authors might help clarify this point.


An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS, Simson Garfinkel, Harvard TechRep

The paper addresses the EC2, S3 and SQS web services provided by Amazon from different directions. It considers ease of use, security, throughput and latency of operations. AWS does not provide any back up or versioning of the data stored. The data stored is also unencrypted. It leaves these responsibilities to the user. Amazon employs some load balancing techniques where it advertises different IP’s to different hosts at different locations. EC2 was tested by the authors by using a maximum limit of 100 instances at a time. S3 was tested using 3 sets of objects – 1 byte, 1 KB, 1MB, 16MB and 100MB. The throughput was tested using a series of successive probes separated by a random delay or simultaneous queries by a maximum of 6 client threads. There was no explicit delay between requests. 
The paper focused the probe collection between EC2 and S3 but it would have been fairer to have equal number of results from both data sets. I am not sure if there is any other reason that the authors chose this. The load balancing carried out by Amazon seems to be detrimental for large objects. Amazon also charges for outgoing and incoming bandwidth and for each transaction. Hence Amazon seems to make it costly for backing up data while not providing it itself. Some cost calculations might be useful to see what fraction of the total cost of using AWS is used for backing up data.
From:	Long Kai <longkai1 SpamElide>
Sent:	Tuesday, April 12, 2011 12:11 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

CS 525 Reviews 04/12
Long Kai (longkai1)
Understanding Availability
Summary:
This paper showed several characteristics of host availability in the
Overnet peer-to-peer file sharing system. Previous measurements on P2P
system are dramatically biased because of the IP address aliasing
problem. The authors empirically characterize the availability of a
large peer-to-peer system, Overnet, over a period of 7 days, showing
that host availability is not well modeled as a single stationary
distribution, but instead is a combination of a number of time-varying
functions, ranging from the most transient to the most permanent.
Pros:
IP address aliasing problem is demonstrated when measure the
availability of hosts in P2P system. Also, this paper makes a step to
measure the complicated availability of P2P system, and shows that the
availability is significantly affected by short-term joins and leaves
of individual hosts and long-term host arrivals and departures.
Cons:
Clients may delete several files after a while and this effect has not
been taken into account in the paper. Also, the speed of transferring
files is very important to measure too. Some files are available but
the speed can be very slow.

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
Summary:
In this paper, the authors analyzed a 200-day trace of Kazaa P2P
file-sharing traffic collected at the University of Washington. The
results reveal dramatic differences between P2P file sharing and Web
traffic. Because peers voluntarily provide resources as well as
consume them, the system must dynamically adapt to maintain service
continuity as individual peers come and go. Also, file-sharing is
being used predominantly to distribute multimedia files, and as a
result, file-sharing workloads differ substantially from Web
workloads. Multimedia files are large and immutable. As a result, the
vast majority of its objects are fetched at most once per client. The
measurements also show that the popularity distribution of Kazaa
objects deviates substantially from the Zipf curves that are commonly
seen for the Web. As for the driving forces, the primary forces in
Kazaa are the creation of new objects and the addition of new users.
This paper further analysis how to improve the efficiency of sharing
files in P2P system based on the features that are revealed in the
study. The authors declared that a large percentage (86%) of
externally downloaded bytes in their workload could be avoided if they
make improvements according to the locality.
Pros:
Our Kazaa trace is over a substantially longer time period than most
other peer-to-peer ?le sharing studies, which allows it to draw
conclusions about long-term behavior.
Cons:
This measurement ignored the problem of IP address aliasing, which
caused the result to be inaccurate.


-- 
Larry
From:	Curtis Wang <cwang89 SpamElide>
Sent:	Tuesday, April 12, 2011 12:09 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

Curtis Wang (wang505)
4/12/11

Measurement Studies

Evaluation of Amazon’s Grid Computing Services

The paper presents an overview of the features of Amazon’s AWS, a suite of Amazon services for utility computing, which includes EC2 (computing), S3 (storage), and SQS (messaging). In addition, the authors describe their experience with the services and what they feel are the main limitations. 

Paper Pros
- Summaries of the features and capabilities of AWS
- Provides suggestions to improve the shortcomings of the services in AWS

AWS Pros
- EC2 instances fast, responsive, and reliable
- Services are scalable and practical
- Availability was “excellent”

AWS Cons
- S3 has high transaction overhead and has lower throughput for files less than 16 MB
- AWS may suffer from unexplained system issues (S3 example from 4/9-4/11)
- SQS’s efficacy is limited to a few transactions per second per thread
- Security issues – risk of password compromise through e-mail reset, single password to access all resources
- Very broad license agreement

As this paper was written quite some time ago (approximately four years), it would be interesting to see if Amazon has addressed some of the issues that the authors have mentioned in the paper, such as its weak security model, its API limitations, or the transaction overhead for S3 data transfers. Also, it would be interesting to see if the performance has changed as AWS’s popularity has increased.

Understanding Availability 

The paper presents availability studies performed on the Overnet peer-to-peer storage system, which is structured on a DHT. They do this utilizing a crawler, which collects snapshots of the IDs of active hosts, and a prober, which periodically checks the availability of a host. They found that IP address aliasing is a significant issue with almost 40% of probed hosts using multiple IP addresses. This implies that probing using only IP addresses would overestimate the number of hosts and underestimate their availability. Host availability decreases over long periods of time, so file redistribution s or reinsertions are necessary. There is very high host turnover—over 20% of the hosts arrive and depart on a daily basis. 


Pros
- Performs experiments that challenge the assumptions that other papers have used when analyzing P2P systems, like the aliasing effects of IP addresses.
Cons
- Studies performed over only a 15 day period
- The method with which they perform crawling and probing seems to be limited in its granularity because of the traffic that it can cause.

From:	trowerm SpamElide on behalf of Matt Trower <mtrower2 SpamElide>
Sent:	Tuesday, April 12, 2011 12:06 PM
To:	indy SpamElide
Subject:	525 review 04/12

Understanding Availability

This paper focuses on the availability of peers in the p2p file sharing service Overnet. The authors argue that current techniques mistake different IP's for different users. By monitoring Overnet the authors were able to show that often the same user (and files) return to the network with a different IP address due most likely to DHCP. This improves availability of users in the system while reducing the overall user count.

I think the studies main contribution is showing that availability is perhaps not as bad as originally thought in p2p networks. The authors seem to misplace blame for IP aliasing on NAT and multiple users on the same machine which would produce the opposite effect they were seeing (less IP's per user). I was surprised to see that the diurnal pattern of usage wasn't very distinct. I suspect that many of these users might be on slower connections trying to make use of bandwidth during the night. Finally, I am interested in the churn of users in the network. The authors showed the percentage of users entering and leaving the system during each 4 hour segment. I would like to know what the distribution of time spent in the network looks like.


Measurement, Modeling, and Analysis of a P2P File-Sharing Workload

This paper presents a measurement study from the University of Washington's network looking at P2P workload of the network. The trace looks at Kazaa traffic over several months including spring, summer, and fall school terms. The authors noticed a decidedly non-Zipf distribution of requests for files due to the immutability of objects in P2P networks. Later the authors show that the distribution of requests is similar to that of video rentals. The authors also present a study of user behavior in these networks showing that most users are patient for downloads and that users tend to “cool-off” after some time, which is intuitive based on the amount of content someone can consume.

In the second part of the paper the authors analyzed how much traffic could be mitigated by allowing for locale aware downloads. Due to the size of Washington's network and short popularity of files significant bandwidth costs could be reduced by keeping traffic internal. I would be interested to see what the traffic distribution numbers would be today in a campus-type network due to the rise of user published content (Youtube). Furthermore it seems that P2P networks have cooled off due to legal actions. Has iTunes now replaced Kazaa has the major bandwidth hog on campus networks?

From:	harshitha.menon SpamElide on behalf of Harshitha Menon <gplkrsh2 SpamElide>
Sent:	Tuesday, April 12, 2011 11:32 AM
To:	Gupta, Indranil
Subject:	525 review 04/12


Understanding Availability

The paper tries to address the question of what it means to be reliable, critically analyses the previous methodologies to understand availability and proposes some characteristics of availability to consider while designing a peer-to-peer system. To identify the hosts in the system and probe them they have set up a crawler and prober. They showed some interesting patterns such as Overall availability of peer-to-peer systems is low, shows diurnal patterns, churn rate is high, there is less correlation between of availability between nodes, the total number of hosts in the system remains constant despite the churn.

Pros:
-The crawler considers IDs as separate hosts and the same host will have the same ID. This considers ip aliasing and thus changing the result of the study. Their study based on considering ip aliasing shows that the 50% of the hosts are available for less than 0.07.
-The study of these characteristics would help in designing a peer-to-peer system.

Cons:
-Their experimental runs are done only for a period of 7 days.


Measurement and Modeling of Large-scale Overlay

This paper does a performance analysis of Amazon EC2 , S3 amd SQS cluster services using a series of end-to-end throughput measurements of S3 from various points on the internet. 
The following were the findings
-EC2 delivers ready-to-go virtual machines at reasonable cost
-Delivers Amazon’s claimed level of throughput for larger data transfers,
-The system is not able to provide peak throughput for smaller transactions due to  transaction overhead
-Availability of the system is really good.
-Effective bandwidth varies heavily based on geographical location
-Consecutive requests receive similar performance results

Pros:
-These experimental results reveal a lot of information about the Amazon EC2 cluster. This helps in tuning the application deployed on these systems.


From:	Simon Krueger <skruege2 SpamElide>
Sent:	Tuesday, April 12, 2011 11:27 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Understanding availability, R. Bhagwan et al, IPTPS 2003

Core idea of the paper:

The core idea of this paper is to analyze availability in an overnet peer to peer file sharing network over 7 days.  Specifically, they remove misconceptions about previous availability in peer to peer systems.  They specify exactly what availability is in a peer to peer system.  They show that availability can be determined by the time of day.  And they measure the rates of churn in overnet.

Pros:

They find that churn in the system is high

Cons:

Thoughts on how the paper can be furthered developed:

Any questions, criticisms, thoughts, doubts, or wanderings:

Is 7 days enough to really analzye and make conclusion about the system?  What is the behavior of the system over a whole year?


Measurement, modeling, and analysis of a peer-to-peer file-sharing workload, Krishna P. Gummadi et al, SOSP 2003

The core idea of this paper is to measure Kazaa traffic, measure the affect of system parameters, and how locality affects the system.


Pros:

Found that documents are fetch once and does not follow Zipf’s law

Used a real life kazaa trace

They found user characteristics: users are patient,  users slow down as they age

The found object characteristics: there is no one workload, objects are fetch once, popularity of objects are short lived, most popular are for new objects, and most requests are for old objects

Cons:

Large code base to measure traces, 30K lines

Thoughts on how the paper can be furthered developed:

Any questions, criticisms, thoughts, doubts, or wanderings:

Is analyzing this type of traffic legal?

Does the behavor in UW’s network describe the whole system’s behavior?

What does the behavior look like over time as more users are switching to bittorrent or legally purchasing media?


Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007

Core idea of the paper:

They measure pplive’s overlay characteristics to help plan resource allocation and design future p2p systems.  Additionally, they have a mathematical model to describe  pplive’s population size and session length.

Pros:

Measure a real life p2p system

found that ppllive overlays have random graph structures

found that the average degree of a peer is independent of the channel’s population size.

found that some pplive peeer pairs have highly correlated availability while others have no correlation

Unlike p2p file sharing users, PPLive peers are impatient

Session lengths are geometrically distributed

channel population size in pplive are larger than p2p file sharing networks and can be fitted with polynomal mathematical models

Cons:

Thoughts on how the paper can be furthered developed:

Any questions, criticisms, thoughts, doubts, or wanderings:

Is it really enough to make statements contrasting other p2p systems when there is only a small number of measurements performed compared to the scale, activity, and history of the system.


An Evaluation of Amazon’s Grid Computing Services: EC2, S3, and SQS, Simson Garfinkel, Harvard TechRep

The core idea of the paper is to measure the through put and latency of Amazon’s Simple Storage Service (S3) from Amazon’s EC2 cluster and other places around the world.

Pros:

They measured from Hardvard, MIT, Los Angles, Pittsburgh, and the Netherlands.

They found an unexplained drop in system performance

They found that amazon delivers on it’s service guarantees when data transfers are 16MB or larger

Cons:

Thoughts on how the paper can be furthered developed:

Any questions, criticisms, thoughts, doubts, or wanderings:

I wonder what these measurements should show today?  Since from what I have heard the work load of amazon’s web services has drastically changed over the recent years.  Specifically, they have had many more users on there system.  It would be interesting if these experiments could be performed yearly and published in some sort of report. 


Simon

From:	Anupam Das <anupam009 SpamElide>
Sent:	Tuesday, April 12, 2011 11:23 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

i. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

This paper presents a measurement and analysis of a P2P file sharing
system called Kazaa. The authors performed an extensive 200–day trace
of over 20 terabytes of Kazaa P2P traffic. They provide some
interesting facts about the characteristics of Kazaa objects and
users. The most important finding is that the behavior of P2P traffic
is significantly different from that of web (WWW) traffic. The
interesting findings regarding user characteristics in Kazaa include--
users are more patient (waiting even weeks to download a file), user
activity slows down as they age i.e., they either use the system less
often or demand less data when they use it and lastly, user remain
active for very short fraction of the time. The paper also highlights
some of the object characteristics of Kazaa which include-- objects
are fetched at most once (since files are generally large and
immutable files are requested once 94% of the time), popular objects
are short lived (popular objects likes audio/video are frequently
updated with new releases which means the older ones are no longer
popular), popular objects tend be new born objects (this is trivial
for multi-media files as people tend to download new releases) and
most importantly Kazaa traffic does not follow zipf distribution. The
authors also highlight that there is significant locality in Kazaa
workload and therefore, they propose a locality aware model to reduce
bandwidth consumption. They basically propose two options for
utilizing locality- one is caching the popular files in the boundary
of network and the other is redirecting the out-going request to some
in-network node given that the file resides inside the network.

Pros:
1. The paper analyzes a huge amount of trace data.
2. The paper highlights important user and object characteristics of
Kazaa network.
3. The paper was published in 2003 when multi-media services were just
emerging, so this paper provides insights into some of the important
characteristics regarding multi-media workload.

Cons:
1. The traces were collected from a university campus. So these
workloads were basically generated by faculty members and students and
thus they do not represent general workload.
2. By generating the desired requests some of the traces could have
been biased towards the findings.
Though the findings in this paper were appropriate in 2003, but now
the characteristic of P2P file sharing systems have changed. So, it
would be interesting to see the change in the current P2P networks.

-----------------------------------------------------------------------------------------------------------------------------------
ii. An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS

This technical report evaluates the performance, availability and
usability of Amazon’s three web based services known as AWS (Amazon
web services). AWS includes Amazon Elastic Compute Cloud (EC2) which
allows users to initiate virtual instances to meet their computing
demand. Simple Storage Service (S3) allows users to store large
amounts of data and Simple Queue Service (SQS) is used to provide
reliable messaging service which facilitates coordination during
large-scale computations. Amazon provides three forms of interface to
AWS – web based dashboard, REST (http based API) and SOAP (for remote
procedure calls). The author looked into the security and licensing
policy of Amazon and found many drawbacks like- there are no SLAs and
thus making no guarantees to the clients. The author found the claims
made by Amazon regarding availability and ease of use to be
appropriate. Some interesting facts came out of the experiments
conducted by the author. For example, in S3 the data transfer
throughput is influenced by the size of the object being accesses. The
author also tested throughput from different parts of the world and
found throughput to vary which questions Amazon’s claim of using
multiple data centers to improve performance. Moreover, it was found
that the overhead associated with transferring small objects is quite
significant. Finally, the author shared his experience about
conducting his research projects using AWS. The author was quite
satisfied with the availability and performance of AWS, but was
frustrated with the lack of design specification which prevented him
from tuning his experiments.


Pros:
1. The paper provides helpful insights about the three services
provided by Amazon.
2. The paper performs extensive experiments on different aspects.
3. AWS provides services with no startup cost.

Cons (of AWS):
1. AWS provides no SLA.
2. Amazon’s WSLA allows Amazon to terminate service at anytime. This
poses a potential threat to customer satisfaction, but gives Amazon
legal flexibility.
3. Amazon provides a very weak security policy. It leaves the
responsibility to the users to authenticate their account and data.
4. AWS provides no snapshot or versioning service of user data. It the
client’s responsibility to do backup.
5. S3 does not provide any rename or move command. Only provides the
primitive GET, PUT and DELETE commands.
6. Lack of C/C++ example code. All of Amazon’s code is in java, perl,
python and Ruby.
7. Lack of design specification. Amazon is reluctant to release any
information regarding the design internals of EC2, S3 and SOS.

-----Anupam
From:	Michael Ford <mdford2 SpamElide>
Sent:	Tuesday, April 12, 2011 10:59 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Understanding Availability

 
This paper tries to understand availability in the setting of peer-to-peer networks. Traditional models of availability are designed around networks with an ideal node uptime of 100%. Peer-to-peer networks are affected by their user's behavior patterns. The results show that in a network with high churn, accurate measurement depends significantly on correctly identifying unique users, despite IP address changes. These problems certainly will not disappear with the growth of mobile computing and the possibility of multiple IP addresses within a single session as opposed to only switching between sessions. The results also indicate that the number of peers is highly dependent on time. There are certain peak hours when users are online, but during non-peak hours, availability can be affected.

The author use a simple crawler to gain knowledge of the peers in the network and a prober to test availability. The crawler runs every four hours and the prober every 20 minutes. I am not convinced that this shows the true turnover rates in a peer-to-peer system. Moreover, their test system had less than 600 nodes online at any given time. That network size is small enough that advanced membership are not required, and the complexities of availability of peer-to-peer systems may not be uncovered.

 
Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming

 
Many peer-to-peer systems have similar architectures in terms of structure, connectivity, etc. However, some application specific systems do follow these patterns. One notable example is PPLive. The results of measurements on the system show that small channels look like random graphs, fan-out is independent of channel size, and users are impatient. These attributes are in strong contrast to traditional file-sharing peer-to-peer networks, where fan-out is dependent on size to provide performance properties, and users are willing to wait for large file downloads.

The authors do not mention the size of the channels in comparison to traditional, large peer-to-peer systems. The fact is that each channel is semi-independent of other channels, and only contains a few thousand nodes. Each channel could be compared to a single peer-to-peer system, but they face none of the same scalability issues due to the differences in scale. The measurements themselves are carried out in more frequently than in some papers, and this leads to more interesting results, including being able to fit a geometric curve to the data.

From:	Tony Huang <tonyh1986 SpamElide>
Sent:	Tuesday, April 12, 2011 10:56 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Paper: Understanding availability

Core idea:
This paper focuses on testing node availability in a p2p file sharing system, Overnet, over a 7 day period. The authors choose Overnet because the clients identify themselves through a unique client ID instead of IP addresses, which could pose the problem of host aliasing due to DHCP. It deploys crawler and prober to measure the system. The purpose of crawler is to collect a snapshot od the IDs of the active hosts at a particular point in time. The prober is to periodically probe a set of hosts to determine if they are available. The experiment shows the following result:
* Host ID address aliasing is a significant issue in deployed p2p system. If we do not consider the IP aliasing aspect, we would underestimate host availability.
* Host availability changes significantly with the length of day, and, as the measurement interval increases, host availability decreases. 
* The number of available hosts decreases during the time of day.
* Hosts availability display only limited degree of interdependence. 
* Host join and depart activity, i.e host turnover, is high.

Thoughts:
* The paper general confirms a lot of common beliefs about p2p network.
* What is the cause of those mobility, time-of-day attrition, etc.? A user profile or common usage pattern study would be very a very interesting study.
* Does the conclusion and measurement acquired in this paper applied to a more general network or a commercial network?

Paper: An evauluation of Amazon's Grid Computing Services:

Core idea:
This paper presents a study on Amazon's various commercial services, including S1, EC2 and SQS. EC2 is a commercial elastic computing services, where people can rent CPU and VM time to run their own applocations. IT provides dash-board based interface, REST API and RPC based API. Security is 
not strictly enforced across the infrastructure and users are advised to impose its own security measurement. AWS provides one more more external ID for a data center which is advertised in Amazon DNS. DNS resolution should not be cached for a long period. AMazon's SSS stores data as named objects grouped in named 'buckets'. The API is limited. Especially, it does not support renaming of files. SQS is a queue-like service. However, it does not provide ordering and does not display FIFO behavior.
Evaluation of AWS draws the following conclusion:
* The bandwidth is consistent, but it suffers from slow-down during certain time of day or internal architecture reconfiguration. The author even describe the amazon service as a testimony of impact of badly tuned TCP stack.
* Amazon uses DNS for both load balancing and re-route traffic purpose. 
* Moving to larger transaction would only improve the performance initially, but little gain is obtained for transactions larger than 100MB in size.
* Writes performs slower than read. This may because write has to reach two machines while read only need to read one machine.
* Write has a larger band-width than read. This may come from the fact that write is acknowledged when the data is written to disk controller.
* Time taken for each query can have significant variance.
* Concurrent access using multiple threads would affect the performance of each thread, but the overall throughput would increase. The decreasing performance is probably due to processor contention on virtual machine.
* Observed failure rate is low. 

-- 
Regards
-- Tony

From:	Shen LI <geminialex007 SpamElide>
Sent:	Tuesday, April 12, 2011 10:51 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Name: Shen Li

Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming

This paper presents some measurement results and the corresponding mathematical models they derived. They use insert a crawler into the PPLive network and periodically takes snapshot of the whole overlay network. Their crawler runs parallel on multiple machines to increase the coverage of their snapshot. Among their findings, some are really interesting and surprising to me. They mentioned that the session lengths are typically geometrically distributed, which deviates a little bit from my intuition. They also provide a simple polynomial mathematical model for the channel population size variation. 

Pros:

1. The parallel crawler can not only save bandwidth in the network but also provide better snapshot for the measurement. According to Comet (the paper we discussed in class previously), one node, perhaps,  may not be able to talk with other remote node directly. Some P2P protocol just banned this functionality for safety reasons. Thus, doing the experiment on one single node may lead to unrealistic measurement result. Do it in parallel on multiple nodes will be better.

Questions:

According to my understanding, what files host on the crawler may also affect the behaviour of the very node. It may influence the number of nodes trying to connect to it. Is this right? If yes, this paper omit this dimension.

An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS

This paper present a good number of measurements based on Amazon's services including EC2, S3, and SQS. They signed up for Amazon's EC2 'beta' program to conduct their experiments. Some of their results are not that interesting to me, such as they conclude that EC2 delivers on its promise of providing ready to go virtual machines, and Amazon services have excellent availability. We actually can tell this from the high popularity of the use of these systems. One interesting thing is that, they mentioned that between 10% to 20% of all queries will suffer decreased performance which is 5 times slower than the mean or worse. It is very useful for applications holding above Amazon's services. As they said, those applications and simply kill the slow instance and restart a new one.

Pros:

They provide a good number of experiments regarding the performance of Amazon's several web services.

Cons:

1. Too little analysis regarding their results and comments. 

2. Their experiments testing the S3 seems weak. Typically, applications holding above S3 will see many simultaneous access and updates. So they may also care about how S3 performs when facing a huge number of clients access at the same time. 

From:	muntasir.raihan SpamElide on behalf of muntasir raihan rahman <mrahman2 SpamElide>
Sent:	Tuesday, April 12, 2011 10:20 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Understanding Reliability

Summary:
This paper is concerned with empirical characterization of
availability in P2P systems. It shows that host availability performs
quite differently from hardware and disk availability and is dependent
on the time of day. The traditional assumption in distributed systems
that transient failures should be ignored and only long term failures
need to be handled does not hold for P2P systems. P2P host
availability also depends on user churn. The measurement
infrastructure in the paper consists of a crawler that provides a
global view of host membership, and a prober that returns fine-grain
information on host availability. The authors show that probing by IP
address overestimates the number of hosts and underestimates
availability due to DHCP and NAT boxes. As a result the measurement
study uses random IDs to calculate availability which avoids IP
address aliasing problems. The temporal effect experiments reveal a
diurnal pattern of availability. The paper also uses a simple
experiment of comparing P(Y=1|X=1) and P(Y=1) to show that the
interdependence of the availability of two hosts X and Y is very low.
In summary the authors show that IP address aliasing can bias the
results of P2P availability studies. Also availability can no longer
be modeled as a single parameter, rather it is a combination of short
term and long term churn.

Pros:
(1) The paper proposes very simple measurement techniques that give
rise to novel and interesting observations about the availability of
P2P systems.
(2) It shows that P2P host availability is different from disk and
hardware availability.
(3) Sheds light on the effect of IP address aliasing on availability studies.

Cons:
(1) Is this measurement study representative of current P2P systems
like bittorrent?
(2) The paper does not seem to have justification of some of the
parameter values. For example, why did they choose to start the
crawler with 50 ids. Maybe using a random number instead of 50 would
make the results more convincing.

Future Works:
(1) A probabilistic analysis of availability could better complement
the proposed empirical analysis.

Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming

Summary:
This paper presents a rigorous measurement and modeling study of the
large scale p2p overlay graphs spanned by the PPLive system. Unlike
previous studies that focused on either user-centric or
network-centric measurements, this paper focuses on overlay-centric
characteristics which is a complex combination of user and network
centric features. The study reveals a number of interesting features
of streaming overlays that are markedly different from traditional p2p
file sharing systems. Specifically, the authors show that (1) small
overlays resemble random graphs, (2) average peer degree is
independent of channel population size, (3) availability correlation
between nodes is bi-modal, (4) PPLive peers are impatient, (5) session
lengths are geometrically distributed, and (6) channel population
sizes are larger for PPLive than for p2p file sharing networks and can
be fitted to a high degree polynomial equation.

Pros:
(1) The justification of using PPLive is explained well in the paper.
(2) Makes an important observation that PPLive overlays may resemble
random graphs.
(3) Shows empirical formulas for channel length and channel population size.
(4) The study reveals that small PPLive overlays are better off
choosing neighbors at random. It also shows that protocols that treat
all nodes equally can improve performance.
(5) It shows that P2P system design performance depends heavily on the
nature of the application.

Future Works:
(1) This study is for streaming overlays. Can the measurement
techniques be reused for other types of overlays?


-- Muntasir
From:	Andrew Harris <harris78 SpamElide>
Sent:	Tuesday, April 12, 2011 6:23 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Review of “Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming”, Vu et al, and “Understanding Availability”, Bhagwan et al

The first paper offers an analysis of PPLive, a p2p video distribution network. The group shows how PPLive differs from more traditional p2p networks, sharing only the impatience of users between them. Using a crawler framework, the group reveals that PPLive networks are rather randomized in their connectivity, compared to self-assembled networks in traditional p2p. They assert that nodes sharing common interests were more likely to have higher content availabilities between them, perhaps an intuitive conclusion given the context of common media interests. They note that while the population of a given media channel does fluctuate wildly over the day, all channels see their peak populations at noon and in the evening. Finally, on the impatience of users, they note that most users stay on only for short videos (50% for sessions shorter than 10 minutes), which is consistent with users browsing for leisure or during their lunch break.

In analyzing PPLive, it seems the group here may have overlooked a key conclusion from the correlation of availabilities across snapshot groups. A high availability for content on a node would imply that a user is a) actively engaged with that content, and b) actively able to rebroadcast that content. This becomes the observed usage pattern for nodes within the same snapshot - they share viewing interests and thus are better matched for one another because they can retransmit this shared content. Consider though that this has a third implication: that a user is actively allowing content to be retransmitted from their node. PPLive users have no overt legal incentive to throttle themselves from retransmitting content, but they may have a bandwidth usage or quality of service incentive. This is especially the case for any user on a slow or shared connection, where upstream bandwidth will be severely limited. The success of PPLive seems to depend on a high number of available nodes, each with some available quantity of bandwidth to rebroadcast data. The authors also stress that usage patterns of p2p networks should be assessed per platform, rather than as a whole, as they can vary depending on the content involved. It would seem then that effects of bandwidth throttling should be considered within and across snapshots, and added to this section about in-snapshot usage.

The Overnet study is an attempt to capture the availability of nodes in a more traditional (albeit DHT-based) p2p file sharing network, with the overall research motivation of the group being p2p storage. This group also developed a crawler, which served to examine the network structure of a given time snapshot in Overnet, as well as a prober, which periodically tested these discovered connections for their availability. It was seen that host masking and aliasing was a problem worth exploring for p2p systems, in that while only 1468 hosts were seen, 5867 IPs were observed as corresponding to these hosts, a 1-to-4 relation. There was also a clear diurnal pattern observed, as peaks in usage could be seen across multiple days. Finally, churn and node turnover in the network was examined, with completely new hosts comprising 20% of all hosts on any given day.

Similar to the study of PPLive, in studying Overnet, the group here only examined the availability of nodes over varying time lengths. This also overlooks the varying effects of bandwidth availability for each node. A common-enough problem in p2p networks is just this, although many attempt to gauge and take into account the measured bandwidth of each client. A more descriptive model of node availability would have associated with it some conversion between bandwidth availability and actual availability. Consider that, for small text files a small bandwidth would be acceptable, but for files like Linux ISOs a slow connection may take a user many days to complete a transfer. As such, the impact of bandwidth would also need to be framed relative to the common types of files being transmitted across the p2p network involved. 

It may seem pointless for large-pipe networks to consider bandwidth in their analyses, however consumers of p2p technology are increasingly gaining reason to monitor their bandwidth usage. The most prevalent reasons seem to be provider caps (and overage penalties), and fear of legal reprisal for sharing illegal content. The latter is only applicable to Overnet, however bandwidth caps are relevant to any high-bitrate data source (video, files, etc.). Users then have an incentive to reduce their broadband availability, thus influencing (as above) their rebroadcasting patterns and actual node availability. These seem to be nontrivial influences! It seems strange in both studies that bandwidth effects were relatively overlooked.From:	nicholas.nj.jordan SpamElide on behalf of Nicholas Jordan <njordan3 SpamElide>
Sent:	Tuesday, April 12, 2011 3:39 AM
To:	Gupta, Indranil
Subject:	525 review 04/12
Attachments:	4_12.docx


Here it is
-- 
Thanks,
Nick Jordan


From:	lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng <ltseng3 SpamElide>
Sent:	Tuesday, April 12, 2011 12:08 AM
To:	indy SpamElide
Subject:	525 Review 04/12

CS 525 - Review: 04/12 Measurement Studies

Measurement, modeling, and analysis of a peer-to-peer file-sharing workload Krishna P. Gummadi et al, SOSP 2003 

The paper analyzed the workload of one peer-to-peer (P2P) file-sharing software, Kazaa, by implementing a crawler to collect user data. Most files shared in Kazaa are multimedia files, so there are two main characteristics different from traditional HTML contents of WWW traffic: immutability and “fetch-at-most-once” of user behavior. The trace data collected by the crawler was consistent with these two properties. Based on the data, the paper proposed a model of file-sharing workloads.

The first contribution of the paper was to collect data in a long period of time (200-day trace), which was much longer than previous works and thus provided more insights on user long-term behavior. This allowed them to draw conclusion on metrics correlated with ages. For example, user’s request byte decreases over time, file’s popularity dies out soon, new files are tend to be more popular, and more users request for old files (, which are the files that exist for more than 30 days in the paper’s definition). The second contribution was that the paper found out the discrepancy between Zipf law and P2P multimedia workload, where Kazaa’s trace had a more flattened head. The paper then developed a model for large files (, which are usually video files that are more than 100 MB). The model not only took fetch-at-most-once behavior into account, but also consider how new files are assigned popularity rank according to Zipf distribution. With suitable choice of parameters, the paper showed that the model prediction was very close to the data they collected. The final contribution was the study of cache strategy to better exploit locality and thus to decrease the workload. One significant finding was that 86% of workload for externally downloaded bytes can be avoided by using a perfect proxy. While this is not a practical deployment, the paper also showed that both locality-aware protocol and redirector can save such bandwidth significantly, as well.

Comments/questions/critiques:

I am quite surprised that older clients request for less data during each usage of system, since there are always new contents released. Perhaps, in older days, new contents’ birth rate was not so high, so older clients had already downloaded much of the files they were interested in and waited for new files. Otherwise, I do not have any intuitive reason for this observation.

While their model seems convincing, I am wondering whether it still provides good prediction today. In particular, I am dubious about the way they assign popularity for new-born files. In the model, the popularity is assigned randomly based on Zipf(1) distribution. However, I claim that there are some correlations among new-born files. For example, when a new movie comes to market and has a great buzz, every user wants to download the movie. In today’s market, the movie usually comes with many other side-products, such as games, albums or even TV interviews or shows. Popularity of all such files should be correlated with each other, since users interested in the movie might have a great chance to download these side-products. It is not clear whether this would affect the model’s precision.

Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007

The paper first noticed that there was insufficiency in previous measurement papers on multimedia streaming. The previous papers lacked of intensive study on overlay-based characteristics. In particular, the paper tried to answer how does the resulting overlay graph constructed by PPLive, a popular Chinese IPTV, look like given different parameters, such as channel population size.

The first contribution was the crawler that incurred lower load on PPLive than previous crawler does due to more flexible termination strategy. Moreover, the crawler estimates channel size more accurately by adopting more general definition of active nodes. Second, the paper found out that unlike P2P file-sharing system, the overlay formed by P2P streaming system is more like random graph as indicated by clustering coefficient. Moreover, the peers have correlated availability in the same snapshot (i.e., among the peers that has been polled in the same data-collecting run of the crawler). It is possibly because of the result of PPLive’s inter-overlay optimization and common interested among users in the same snapshot. Third, the paper used a discrete mathematical series to model the PDF of session length and a 9th degree polynomial to model the channel population size. Finally, based on the finding in the paper, some practical suggestions were presented. In particular, the simple and homogeneous protocol should work well for P2P streaming service is quite interesting. The paper argued that such solution would work due to random graph-like overlay and each node’s memoryless session length.

Comments/questions/critiques:

The term “patience” was kind of abused in the paper. One is about lack of interest in content and the other about intolerance of start-up delay. While reading paper, I was confused first, because for me, patience is more related to the second meaning. Moreover, the conclusion on session length and channel characteristics and user preference do not convince me. The explanation in the paper was not the only interpretation of the data. For example, user might just switch frequently at first in order to find one channel she likes most, and this behavior is not related to channel characteristics (I assume in the paper, this means delay and quality of the program).

The models are not very intuitive, and wouldn’t they be overfitting?

From:	Qingxi Li <cs.qingxi.li SpamElide>
Sent:	Monday, April 11, 2011 11:39 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming

This paper measures the overlay-based characteristics of PPLive. Some of these characteristics are caused by the protocol of PPLive and the others are caused by the television application.

It finds out that some PPLive overlay network are random graphs and the average degree of a peer in the overlay is independent of the channel’s population. I think this two finds are based on the detail overlay protocol of the PPLive. 

Besides this, the authors also find that the people are impatient which means that some peers leave in a short time. This situation also happens in television. Before finding the channel the person really wants, s/he will continually switch the channel. And this may also cause by the start-up delay is longer than the people expectation. So people cannot wait this channel to start up any more and switch to another one. 

The population of the channel varies larger than the file sharing p2p system and this variation is a function of time. As PPLive is a network television application, unlike the file sharing p2p systems. For the file sharing systems, people can just put it background and keep on working or go away from the computer. So the session time of them is always long. However, PPLive can only be used when people is in front of the computer, so the population will dependent on the time and from the result we can find that at noon and in the evening the population will be much larger than the other time. 

The correlation between PPLive peer pairs is either highly correlated availability or no correlation. The high correlation may caused by the same scheduler of the users, like the two persons in one company, as the rest time and working time are the same, the time they use PPLive will also be the same. Besides this, it’s also possible that the two peers are on the same computers by one person as the people don’t want to wait the stream loading.

Just as the authors mention in the paper, the overlay characteristics are lie in between the user-centric view and the network-centric view. The user-centric view will be caused by the characteristic of the application and the network-centric view will be caused by the protocol.

Understanding Availability

This paper measures the Overnet file sharing network peers availability. The authors choose this p2p file system just because it suppose the session ID which will give to the peer when the peer join the network and destroy when the peer leave. With the help of the session ID, the availability can easy be measured and the previous work based on the IP address which will be affected by the DHCP and the same peer will be counted multiple times. To measure the availability in the system, the crawler repeatedly requests 50 random ID and these requests will discover new peers. Then the crawler request the 50 ID to the new finding peers to find the other peers. This process will be done once 4 hours. Besides this, every 20 minutes the process will randomly choose some ID to test whether this Ides are still in the network. From the result, the DHCP makes the previous measurement result nearly 4 times large. And there is no correlation of the nodes. I wonder whether 20 minutes is too long to detect whether the peers is still in the system. Besides this, as the subset is chosen randomly, there exists the possibility that the nodes which are not in the system never been detected. 

From:	Tengfei Mu <tengfeimu SpamElide>
Sent:	Monday, April 11, 2011 8:22 PM
To:	Gupta, Indranil
Subject:	525 review 04/12

1. Understanding Availability

This paper discusses the concept of availability in the Overnet peer-to-peer file sharing systems. Overnet is widely deployed DHT-based peer-to-peer network and users could be tracked by ID instead of their IP address in the Overnet. The authors have used crawler to get a snapshot of the active peers in the system at a particular time and prober to decide whether particular subset of nodes crawled by crawler are alive at a particular time. Then aliasing effects, time of day effects, node availability interdependence and arrival and departure of nodes issues are studied. They found that peer-to-peer host availability depends on several factors above instead of just one single parameter. 

Pro:

1. The finding is meaning for future availability analysis.

Con:

1. The paper only study the Overnet behavior. We don’t know whether the same argument could hold for other P2P systems.

2. The authors did trace experiments only for 7-15 days, imposing the possibility of loss some long-term pattern findings.

 
2. An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS

This report investigates Amazon Web Services (AWS) based on the analysis of Amazon’s security model. AWS is leading in the cloud-computing field. The authors introduce EC2, S3 and SQS briefly, and explains what kind of model they are using to provide their service and the corresponding pricing policy. This includes aspects such as security means, virtualization based technologies and even legal issues. The paper also presents their experience of using AWS, and evaluations for them. 

Pro:  AWS is sold without minimum pricing very fine granularity.

Con: There is no service level agreement 

From:	david.m.lundgren SpamElide on behalf of David Lundgren <lundgre4 SpamElide>
Sent:	Monday, April 11, 2011 11:54 AM
To:	Gupta, Indranil
Subject:	525 review 04/12

Understanding Availability

Understanding Availability is one of those simple, yet disruptive
papers that come along every now and then and challenge the
methodology and assumptions of a field. Bhagwan et al. verify the
independence of node a's availability on node b's and that node
availability follows a diurnal pattern; the authors show that the
parametrization of node availability of [0,1] is overly simplistic;
the rate of host churn in P2P systems is examined; and previous
studies are shown to underestimate host availability due to IP
aliasing. Global snapshots of Overnet are obtained using a ``crawler''
which recursively samples system state six times a day. Smaller, more
exact analysis of node subsets is carried out using a prober that
heartbeats nodes three times hourly.

Pros, Cons, Comments, and Questions:

- The authors' methodology for examining P2P host availability without
  relying on IP addresses as a unique identifier is rigorous and the
  scope (and cost) of such IP aliasing errors is shown to be extremely
  high and the scope (and cost) of such IP aliasing errors is shown to
  be extremely high.

- The generalization of their results to arbitrary peer-to-peer
  systems is implicitly assumed, but there is little evidence given
  for this assumption. Time-of-day effects, host availability
  interdependence, and churn generalizability are suspect.

- Overnet's size seems smaller than the other, more popular P2P
  networks discussed in class. Also, the authors only discuss small
  window snapshots of Overnet (on the order of days). Future work could
  examine the paper's conclusions at a larger time scale.

-------------------------------------------------------------------------

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Using a 200-day trace of Kazaa traffic from University of Washington's
network, Gummadi et al. characterize the system workload as the amalgam
of discrete workloads based on object size. This multimedia workload is
contrasted with traditional web workloads, and is shown to exhibit
unique characteristics resulting from its distinguishing feature of
fetch-at-most-once (FAMO) requests. The authors show that such
multimedia workloads do not follow the ubiquitous Zipf distribution due
to the FAMO nature of requests generated (requests of this type result
from the immutability of media objects). A model for multimedia
workloads is introduced and is shown to provide a good approximation for
the Kazaa data. The model's general fit (or perhaps, more appropriately,
Zipf's ``disfit'') to other multimedia workloads is shown. The
feasibility of locality-aware optimizations for a Kazaa like network are
discussed and upper bounds of system efficiency improvement are
calculated.

Pros, Cons, Comments, and Questions:

- The authors' model of P2P file-sharing workloads using a
  fetch-at-most-once mechanism accurately models the non-Zipf
  distribution of Kazaa's workload. The accuracy of the model's fit is
  impressive, lending strong support to the claim of multimedia objects'
  immutability being the driving force for the differing web and media
  workloads.

- The observation that new object arrival improves performance (and that
  client arrivals do not counteract the decrease in hit-rate resulting
  from client aging) is non-intuitive. Because newly introduced objects
  help to rejuvenate cache-hits, an object-birth-sensitive cache improve
  a system's hit rate.

- While the authors discuss measuring the popularity of objects over
  time, I think a more detailed analysis is in order. Such an analysis
  could lead to smarter cache insertions and more efficient data
  replication.
From:	Ankit Singla <iitb.ankit SpamElide>
Sent:	Monday, April 11, 2011 8:22 AM
To:	Gupta, Indranil
Subject:	525 Review 04/12

1. Understanding Availability
--------------------------
Summary: This paper contrasts the notion of availability in a
centrally managed system like a cluster against availability in a p2p
system in the face of node-churn. Availability in the former is fairly
well known because of fairly reliable components with somewhat known
failure characteristics (e.g. disk drive failures etc.) and certain
assumptions on independence of failures in components. In p2p systems
however, with human players often being involved, availability varies
with time as participants enter and leave at will. Moreover, departure
of clients from the system can be highly correlated (based on
time-of-day correlations for instance). To study availability, they
experiment on the Overnet file-sharing system.

Comments: It surprises me as to how many IP addresses are used by the
same hosts (more than 10 different IPs for 12% hosts). I guess this is
a significant result in that measurements based on IPs underestimate
availability. Their measurements of host availability versus time are
also interesting, leading to the conclusion that the measurement
interval must also be specified with any numbers on availability.

2. Amazon Evaluation
--------------------
Summary: The paper evaluates Amazon's cloud services quantitatively,
and to some extent for ease-of-use too. The experiments and analysis
are fairly extensive over a period of more than half a year. It's
interesting that they saw some service characteristics changing as
Amazon made modifications to their systems (e.g. Fig. 1 - throughput
falls significantly on April 1.) Their overall conclusion is that
Amazon's offerings fairly match up to the advertised level. For short
transfers, the storage service many deliver much lower throughput
though (is this just a TCP inefficiency?)

Comments: Another interesting question around migrating an application
to the cloud is how reliability is affected. For instance, if multiple
replicas end up being placed on different VMs on the same physical
machine, reliability of the application definitely suffers. While the
paper critiques Amazon's weaknesses on security, privacy and
reliability, I think this helps them keep costs down for the general
customer-base. Not everyone requires top-notch guarantees on these
things, so making them a fundamental part of the architecture is
probably a bad business idea, as appealing as it might be
academically. (Maybe an end-to-end argument applies -- end users take
care of these functions themselves because anyway security, etc. has
to be ensured at end-points too.) There are some interesting legal
issues discussed in the paper including libel and sexual orientation
based discrimination using Amazon's infrastructure.

Ankit
--------------------------------------------------------------------
Ankit Singla
Graduate Student, Computer Science
University of Illinois at Urbana-Champaign (UIUC)
http://www.cs.illinois.edu/homes/singla2/
From:	Chi-Yao Hong <cyhong1128 SpamElide>
Sent:	Sunday, April 10, 2011 9:53 AM
To:	Gupta, Indranil
Subject:	525 review 4/12

---- Understanding availability, IPTPS’03 ----
This paper studied a fundamental measurement problem about the
characteristics of host availability. The authors measured Overnet, a
peer-to-peer file sharing system, which allows them to identify the
host by its overlay id instead of the IP address. As difference
between the availability of an IP address and that of an actual user
ID is reported. The major finding is that many user changes its IP
address quickly so IP-address-based availability measurement might
provide biased results. Here are some other take-aways.
1. The distribution of host availability changes rapidly with the
measurement period.
2. A strong diurnal pattern is observed for host availability.
3. The availabilities of all hosts are strongly dependent on each other.
4. Overnet has a high host turnover rate, which also has significant
impacts to the system availability.
Critiques:
A two-week measurement might be too short. Some periodicities cannot
be seen from this short-term measurement, e.g., weekly pattern. Also,
I argue that the availability is very application dependent, and
Overnet is representative for only file sharing peer-to-peer instead
of other p2p systems.

---- Measurement, modeling, and analysis of a peer-to-peer
file-sharing workload, SOSP’03 ----
This paper studied the peer-to-peer file-sharing workload. There are
three main goals achieved by this paper. First, it analyzed the
network characteristics of peer-to-peer file-sharing traffic. This
allowed authors to compare the P2P traffic with web traffic, which has
distinct differences. The author then showed that the Kazza popularity
function is substantially deviated from Zipf. Second, the author
showed that the main workload in Kazza is driven by the creation of
new objects and the addition of new client. This is unlike Web, whose
workload is driven by the change of documents. Third, the authors
analyzed the potential performance of location-aware p2p mechanisms.
Pros:
1. Presentation is clear, and the measurement is technically sound.
The motivation is strong, and the results are interesting. This is
textbook-quality content which I would suggest people to read if they
want to work on network measurement.
2. The inferences are plausible – Immutability and
“fetch-at-most-once” are the key factors that affect the p2p traffic.
Cons:
1. A long-tailed CDF figure of download latency does not necessary
mean that users are patient as the partial requests are filtered out.
2. Except for web traffic and VoD, It would be more interesting to
compare the p2p traffic load with other on-demand systems such as IPTV
and VoIP/video conferencing.
3. While there is a clear bandwidth benefit to use location-aware
schemes, the authors do not consider the control overheads for
enabling location-aware services.

-- 
Chi-Yao Hong
Computer Science, Illinois
http://cyhong.projects.cs.illinois.edu