From: Igor Svecs Sent: Tuesday, April 12, 2011 12:31 PM To: Gupta, Indranil Subject: 525 review 04/12 Igors Svecs CS 525 2011-04-12 Understanding availability The authors argue that traditional notion of availability as a fraction [0,1] does not apply to distributed systems any more. They claim that node's availability may be in fact a function of several arguments, such as time and availability of other nodes. This paper studies Overnet – a file-sharing DHT-based peer-to-peer network that was popular at the time the paper was published (2003). The measurement was performed by crawlers that discovered node IDs, and probers that periodically probed each discovered node to determine its availability. The first finding of this paper is that using IP addresses alone as host identifiers is not accurate, as a lot of hosts (~40% in one day) have varying IP addresses most likely because of DHCP. Second finding of this study is that host availability varies with time-of-day which is rather intuitive. Finally, nodes are found to be independent (in contrast to the streaming multimedia study). COMMENTS This paper overlooked some other factors that may affect reliability. For example, it varies significantly depending on the application – for example, we would likely see availability correlation in a datacenter distributed storage system as opposed to Overnet. It also surprised me that the ratio of unique hosts to IP addresses is 1:4 and not the other way around, given that there is a shortage of IP addresses and many hosts are hidden behind NATs. Perhaps this situation was different in 2003 when the paper was published. As a general criticism of the paper, I would argue that their results are not significant as they could have studied more factors that affect availability and/or suggest an analytical model. Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming SUMMARY This measurement study examines PPLive multimedia peer-to-peer network, and focuses on its overlay-specific characteristics. PPLive is an IPTV protocol that enables users to subscribe to various channels; first by contacting membership servers and then getting neighbor lists from the peers. The authors argue that various characteristics are different depending on the application used in the overlay; for example, the structure of PPLive overlay graphs is close to random, in contradiction to file-sharing overlays. They also show that average node out-degree is independent of channel size; likely because streaming media protocols do not need to maximize bandwidth compared to file-sharing networks. Other findings include: Nodes in the same snapshot have correlated availability, while random node pairs have independent availability. This supports an argument that specific application and users determine characteristics of an overlay. They also confirmed intuitive idea that PPLive users are impatient (smaller average session on a channel), and that channel population varies widely over a day (due to live nature of served content). COMMENTS This study makes a valuable contribution of examining streaming multimedia-specific characteristics of overlays and comparing them to other applications (such as file sharing), in particular that the authors attempt to interpret measurement result. It is unclear why the authors assumed that it is unknown how a node returns a list of its neighbors – is it not specified in the protocol, or do we not trust that implementations will follow specifications? Their snapshot algorithm is not consistent, but it is probably impossible to ensure consistency without control of the nodes. This paper also raised a general question whether some properties of overlays (for example, node degrees) can be studied analytically, or do they actually need to be measured. From: w SpamElide on behalf of Will Dietz Sent: Tuesday, April 12, 2011 12:29 PM To: Gupta, Indranil Subject: CS525 Review 4/12 Will Dietz 4-12-2011 cs525 -- Availability "Understanding Availability" Ranjita Bhagwan, et al, UCSD This paper tackles the problem of defining "availability" of a peer-to-peer system, arguing that existing measumrents and availablity models are insufficient. They argue that availablity shouldn't just be a single metric, single value between [0,1), but rather that it should be a composition of individual host availability, time-of-day, and particularly host availablity interdependence. One important contribution is that they point out that existing measurements often ignore the important effects of host aliasing (same host, different IP) that lead to incorrect conclusions regarding degraded host availability and system churn. Additionally they present their various traces from 7-14 day analysis, which mostly show what you'd expect (but is still hugely useful to show/demonstrate). Host availability is a strange metric (as measured in percent hosts the crawler has ever seen that are availbale) in that over a period of time, due to things like nodes sometimes just permanently leaving and the like, goes down as the time frame the measurement is conducted over. Time-of-day is interesting in that due to trends in arrivals/leaving throughout the day, one might want to reconsider some models that assume less transient node 'failure', as well as highlights what might look like churn but perhaps isn't. Additionally they noted that the system had less hosts overtime (I'd imagine that over a much longer period of time you'd expect it to be more or less stable, if not growing), which has implications for file-storage systems that might not deal well with decreasing host capacity (they point out Oceanstore, but Pastry comes to mind as well). "An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS" Simson L. Garfinkel, Harvard My first reaction was that it seemed strange for a Harvard professor to be writing about the availablity of Amazon's services, and curious about the business implications of studies such as this. Would Amazon be aware of the study? Might they bias it somehow, and how would we know? Just a thought... Anyway, this works investigates Amazon's various 'cloud' services such as EC2, S3, and SQS and their respective availability. The authors point out that Amazon provides no SLA, but makes various claims indicating high availability and suitability for various applications. This, of course, makes one wonder exactly how reliable their services are, and how suitable they might be for your business use. Frankly tad more surprised such studies wren't more readily available--before I used them for something in my business I'd imagine I'd want some kinda of sense of what I was getting myself into. They summarize their results by saying that while Amazon mostly lives up to its claims, their systems suffer in strange and unpredictable ways. They also go into good detail analyzing the various aspects of what one might use in part of determining the suitability or quality of Amazon's offering with respect to your particular needs--from DNS issues, to the failure policies (S3 says 'retry until it works!') One strange note on their study is that their EC2 analysis (which, btw, largely uses S3 for its storage, as I understand it...) was during the EC2 beta. They report great results, but you can imagine that there are two biases here--during beta less users so might work better for that reason, but also during beta it's still, well, *in beta* and might have issues that they intend to resolve for later. For that reason while I'm fond of the EC2 services personally, I'm not sure how to take their results. All in all this is a paper I will keep around and re-read closely if I'm to use any of Amazon's services for something important in the future. Of course, one has to ask how wlell measurements like this work across time (it's been at least 4 years, and that's a fairly long time for such things). ~Will Dietz From: kevin larson Sent: Tuesday, April 12, 2011 12:25 PM To: Gupta, Indranil Subject: 525 review 04/12 Bhagwan et al set out to explore and understand the deeper nuances behind ‘availability’. Design decisions based off of inaccurate or incorrect assumptions about failure and churn could impose significant overhead on a peer-to-peer system. They chose to use the Overnet peer-to-peer file-sharing system, because of two factors, unique user IDs and because of its size and deployment. Their implementation involved two components, a crawler and prober. The crawler was used to get a snapshot of the current user IDs in the network. The prober was responsible for probing a random subset of the users gathered by crawler (at a much more frequent rate). They crawled and probed Overnet for 15 days, and measuring availability and modeling it in a variety of ways. They demonstrated the variance over time of the host availability. They also modeled daily patterns, the relations between host pairs, and the arrivals/departures of hosts. The authors not only motivated their decision to use user IDs over ip addresses, but actually analyzed the differences in their results had they instead used ip addresses. They analyzed the effects of multiple users on single ips as well as single users on non-static ips. They also were thorough in their other evaluation and demonstrate the effects these results could have on overhead. Although the evaluation was good, the time frame of only 15 days seemed weekly explained and motivated. The authors wrote it of as short term results and left longer results to future work. Vu et al measured and modeled the PPLive peer-to-peer streaming service. They crawled PPLive channels and take snapshots of the peers in that channel, and traced traffic between PPLive nodes and servers. They then modeled their data in a variety of ways, usually with graphs of user connections. They demonstrated things such as node degree and showed that it was independant of channel size. They also showed how increasing channel sizes make the user graphs increasingly less random and clustered. They demonstrated differences betwee peer-to-peer file systems and PPLive, showing availability correlations between pairs of users in channels. PPLive was similar to other peer-to-peer systems when observing users outside of channels, in that availability was independent. They also showed how channel sizes varied with times of day. The authors used their results to demonstrate varying behaviors in peer-to-peer systems, and make a strong argument that most systems and models will not work for all peer-to-peer systems. Due to the number of potential applications, they propose to make general systems which can be applied to the respective peer-to-peer systems. The authors wrote a particularly interesting conclusion session, which both clearly laid out the important findings, and also proposed how to apply them to systems and how to further extend their results with future work. From: Jason Croft Sent: Tuesday, April 12, 2011 12:18 PM To: Gupta, Indranil Subject: 525 review 04/12 Understanding Availability This paper examines the availability of hosts in a peer-to-peer storage system, arguing that availability in these systems is poorly understood compared to devices like disks. The authors collect data over 15 days from the Overnet P2P file-sharing system, which uses immutable IDs. They implement two components: a crawler that collects snapshots of the IDs of active hosts, and a prober to periodically check if hosts are available. Over 15 days, 1468 unique hosts and 5867 unique IPs responded to the prober. The authors believe IP address aliasing is responsible for this large difference, for example, through DHCP or NATs. Host availability follows a diurnal pattern, and the difference between the maximum and minimum number of hosts is about 100. The authors also observe high churn, with 6.4 joins and leaves per host per day. New nodes (that is, new IDs) join at roughly the same rate nodes permanently leave, comprising about 20% of the system. There is also significant independence between the availability of host pairs. From the collected data, the authors discover some interesting observations. For example, using IP addresses to identify hosts would underestimate availability by a factor of four, and would result in more replicas of files than required, thus wasting storage space. Since host availability decreases over time, periodic file refreshes would be needed in the system. Given the high rate of churn, certain systems that replicate objects as nodes leave or join would incur high overhead to transfer data between hosts. A significant number of host IDs used multiple IP addresses--32% used five or more, and 12% used 10 or more. I think this observation could have been studied more, though it may have been outside the scope of the paper. An interesting comparison that the authors did not look into would have been the number of hosts that share the same IP address. Also, as the authors note, IDs are not completely immutable--a node can easily change it by deleting a configuration file. Thus, any clients that may have reinstalled Overnet or manually deleted the file would have slightly affected their statistics. An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS Utility computing is a large infrastructure built by an organization and rented out for computation, storage, and bandwidth on an as-needed basis, such as Amazon Web Services (AWS). While examining Amazon's different services, the authors find that it mostly delivers on availability but suffers in performance. AWS security uses credentials that can be downloaded from the dashboard, so anyone with access to the account can gain access to the resources linked with that account. SSL is used to prevent man-in-the-middle attacks, but there is no guarantee of privacy of information stored in S3. Therefore, tenants are advised to encrypt data, and should also verify integrity with HMAC or a digital signature. AWS does not offer any type of snapshot or versioning of user data or backups, and redundancy is left to the client. However, the Web Service License Agreement (WSLA) protects Amazon from tenants attempting to sue the provider if any damage arises out of using the services. When using S3, the client is responsible for retrying failed requests since S3 is designed to quickly fail requests that encounter problems. Clients must similarly check for write and read errors, though the authors do not experience such errors in their experiments. There is no method of copying, renaming, or moving an object to a different bucket without incurring data transfer charges. The authors' experiments show S3 performs better with larger transaction sizes (over 16MB), but little additional performance is gained for files beyond this size. Performance is between 10 and 50 transactions per second, with writes more likely to be slower than 10 TPS than reads. The lack of a Service Level Agreement for Amazon's services seems a bit odd considering the customers they target. The authors discuss the WSLA, but only vaguely describe how this is different from a standard SLA. As this paper is several years old it does not discuss the S3 outage in 2008, which caused Amazon to provide less than the 99.99% availability they claim for stored data. Many of the points regarding S3 reinforce the need for a system similar to RACS (Redundant Array of Cloud Storage). The authors mention that crashed EC2 instances continue to acquire charges until rebooted, which seems to be a significant design flaw. This requires companies using EC2 to have an effective monitoring system of their instances in place to prevent additional charges from instances that may have crashed but were not rebooted. From: anjalis2006 SpamElide on behalf of Anjali Sridhar Sent: Tuesday, April 12, 2011 12:17 PM To: Gupta, Indranil Subject: 525 review - 4/12 Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007 The paper attempts to study the p2p streaming characteristics of the PPLive system. The paper presents the PPLive overlay characteristics and some models that explain the session length and the channel size. The authors present a more inclusive definition of active peers. Active peers are peers that appear either in the list received from the membership server or are present in the partner list of another active peer. The paper also focuses only on the out degree of nodes called the k response degree. The authors use a crawler which joins a PPLive channel and queries peers also viewing the channel. The metrics studied in this paper are 1)node degree overlay 2) overlay clustering coefficient 3) availability correlation among nodes in the overlay 4) overlay population size 5) node session length per overlay. These metrics are studied using the snapshot operation and partner discovery performed by the crawler. The snapshot algorithm consists of querying an initially received peer list for partner lists and appending new peers onto the peer list. The paper is able to provide information about the metrics mentioned above. PPLive overlays are random graphs when there are a small number of nodes else they are highly clustered. While some peer pairs have a high correlation others have none. The session lengths and channel population size are modeled after geometric distribution and polynomial mathematical models respectively. Unlike P2P file sharing users, PPLive users are impatient with start-up time of the streaming service. By differentiating between P2P file sharing systems and the P2P internet streaming services, the paper is able to provide a starting point in improving current streaming protocols. Correlation among peers can be used to decrease the start up time of the channel. If we know of the top k peers who might be up at a point in time, we can query them instead of the membership server. The fact that the average node degree remains constant is explained from the view of a node gaining partners. However is there a limit when a partner request is got by a node? The new nodes that want to view the channel might increase the out degree of the already present nodes. More explanation by the authors might help clarify this point. An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS, Simson Garfinkel, Harvard TechRep The paper addresses the EC2, S3 and SQS web services provided by Amazon from different directions. It considers ease of use, security, throughput and latency of operations. AWS does not provide any back up or versioning of the data stored. The data stored is also unencrypted. It leaves these responsibilities to the user. Amazon employs some load balancing techniques where it advertises different IP’s to different hosts at different locations. EC2 was tested by the authors by using a maximum limit of 100 instances at a time. S3 was tested using 3 sets of objects – 1 byte, 1 KB, 1MB, 16MB and 100MB. The throughput was tested using a series of successive probes separated by a random delay or simultaneous queries by a maximum of 6 client threads. There was no explicit delay between requests. The paper focused the probe collection between EC2 and S3 but it would have been fairer to have equal number of results from both data sets. I am not sure if there is any other reason that the authors chose this. The load balancing carried out by Amazon seems to be detrimental for large objects. Amazon also charges for outgoing and incoming bandwidth and for each transaction. Hence Amazon seems to make it costly for backing up data while not providing it itself. Some cost calculations might be useful to see what fraction of the total cost of using AWS is used for backing up data. From: Long Kai Sent: Tuesday, April 12, 2011 12:11 PM To: Gupta, Indranil Subject: 525 review 04/12 CS 525 Reviews 04/12 Long Kai (longkai1) Understanding Availability Summary: This paper showed several characteristics of host availability in the Overnet peer-to-peer file sharing system. Previous measurements on P2P system are dramatically biased because of the IP address aliasing problem. The authors empirically characterize the availability of a large peer-to-peer system, Overnet, over a period of 7 days, showing that host availability is not well modeled as a single stationary distribution, but instead is a combination of a number of time-varying functions, ranging from the most transient to the most permanent. Pros: IP address aliasing problem is demonstrated when measure the availability of hosts in P2P system. Also, this paper makes a step to measure the complicated availability of P2P system, and shows that the availability is significantly affected by short-term joins and leaves of individual hosts and long-term host arrivals and departures. Cons: Clients may delete several files after a while and this effect has not been taken into account in the paper. Also, the speed of transferring files is very important to measure too. Some files are available but the speed can be very slow. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Summary: In this paper, the authors analyzed a 200-day trace of Kazaa P2P file-sharing traffic collected at the University of Washington. The results reveal dramatic differences between P2P file sharing and Web traffic. Because peers voluntarily provide resources as well as consume them, the system must dynamically adapt to maintain service continuity as individual peers come and go. Also, file-sharing is being used predominantly to distribute multimedia files, and as a result, file-sharing workloads differ substantially from Web workloads. Multimedia files are large and immutable. As a result, the vast majority of its objects are fetched at most once per client. The measurements also show that the popularity distribution of Kazaa objects deviates substantially from the Zipf curves that are commonly seen for the Web. As for the driving forces, the primary forces in Kazaa are the creation of new objects and the addition of new users. This paper further analysis how to improve the efficiency of sharing files in P2P system based on the features that are revealed in the study. The authors declared that a large percentage (86%) of externally downloaded bytes in their workload could be avoided if they make improvements according to the locality. Pros: Our Kazaa trace is over a substantially longer time period than most other peer-to-peer ?le sharing studies, which allows it to draw conclusions about long-term behavior. Cons: This measurement ignored the problem of IP address aliasing, which caused the result to be inaccurate. -- Larry From: Curtis Wang Sent: Tuesday, April 12, 2011 12:09 PM To: Gupta, Indranil Subject: 525 review 04/12 Curtis Wang (wang505) 4/12/11 Measurement Studies Evaluation of Amazon’s Grid Computing Services The paper presents an overview of the features of Amazon’s AWS, a suite of Amazon services for utility computing, which includes EC2 (computing), S3 (storage), and SQS (messaging). In addition, the authors describe their experience with the services and what they feel are the main limitations. Paper Pros - Summaries of the features and capabilities of AWS - Provides suggestions to improve the shortcomings of the services in AWS AWS Pros - EC2 instances fast, responsive, and reliable - Services are scalable and practical - Availability was “excellent” AWS Cons - S3 has high transaction overhead and has lower throughput for files less than 16 MB - AWS may suffer from unexplained system issues (S3 example from 4/9-4/11) - SQS’s efficacy is limited to a few transactions per second per thread - Security issues – risk of password compromise through e-mail reset, single password to access all resources - Very broad license agreement As this paper was written quite some time ago (approximately four years), it would be interesting to see if Amazon has addressed some of the issues that the authors have mentioned in the paper, such as its weak security model, its API limitations, or the transaction overhead for S3 data transfers. Also, it would be interesting to see if the performance has changed as AWS’s popularity has increased. Understanding Availability The paper presents availability studies performed on the Overnet peer-to-peer storage system, which is structured on a DHT. They do this utilizing a crawler, which collects snapshots of the IDs of active hosts, and a prober, which periodically checks the availability of a host. They found that IP address aliasing is a significant issue with almost 40% of probed hosts using multiple IP addresses. This implies that probing using only IP addresses would overestimate the number of hosts and underestimate their availability. Host availability decreases over long periods of time, so file redistribution s or reinsertions are necessary. There is very high host turnover—over 20% of the hosts arrive and depart on a daily basis. Pros - Performs experiments that challenge the assumptions that other papers have used when analyzing P2P systems, like the aliasing effects of IP addresses. Cons - Studies performed over only a 15 day period - The method with which they perform crawling and probing seems to be limited in its granularity because of the traffic that it can cause. From: trowerm SpamElide on behalf of Matt Trower Sent: Tuesday, April 12, 2011 12:06 PM To: indy SpamElide Subject: 525 review 04/12 Understanding Availability This paper focuses on the availability of peers in the p2p file sharing service Overnet. The authors argue that current techniques mistake different IP's for different users. By monitoring Overnet the authors were able to show that often the same user (and files) return to the network with a different IP address due most likely to DHCP. This improves availability of users in the system while reducing the overall user count. I think the studies main contribution is showing that availability is perhaps not as bad as originally thought in p2p networks. The authors seem to misplace blame for IP aliasing on NAT and multiple users on the same machine which would produce the opposite effect they were seeing (less IP's per user). I was surprised to see that the diurnal pattern of usage wasn't very distinct. I suspect that many of these users might be on slower connections trying to make use of bandwidth during the night. Finally, I am interested in the churn of users in the network. The authors showed the percentage of users entering and leaving the system during each 4 hour segment. I would like to know what the distribution of time spent in the network looks like. Measurement, Modeling, and Analysis of a P2P File-Sharing Workload This paper presents a measurement study from the University of Washington's network looking at P2P workload of the network. The trace looks at Kazaa traffic over several months including spring, summer, and fall school terms. The authors noticed a decidedly non-Zipf distribution of requests for files due to the immutability of objects in P2P networks. Later the authors show that the distribution of requests is similar to that of video rentals. The authors also present a study of user behavior in these networks showing that most users are patient for downloads and that users tend to “cool-off” after some time, which is intuitive based on the amount of content someone can consume. In the second part of the paper the authors analyzed how much traffic could be mitigated by allowing for locale aware downloads. Due to the size of Washington's network and short popularity of files significant bandwidth costs could be reduced by keeping traffic internal. I would be interested to see what the traffic distribution numbers would be today in a campus-type network due to the rise of user published content (Youtube). Furthermore it seems that P2P networks have cooled off due to legal actions. Has iTunes now replaced Kazaa has the major bandwidth hog on campus networks? From: harshitha.menon SpamElide on behalf of Harshitha Menon Sent: Tuesday, April 12, 2011 11:32 AM To: Gupta, Indranil Subject: 525 review 04/12 Understanding Availability The paper tries to address the question of what it means to be reliable, critically analyses the previous methodologies to understand availability and proposes some characteristics of availability to consider while designing a peer-to-peer system. To identify the hosts in the system and probe them they have set up a crawler and prober. They showed some interesting patterns such as Overall availability of peer-to-peer systems is low, shows diurnal patterns, churn rate is high, there is less correlation between of availability between nodes, the total number of hosts in the system remains constant despite the churn. Pros: -The crawler considers IDs as separate hosts and the same host will have the same ID. This considers ip aliasing and thus changing the result of the study. Their study based on considering ip aliasing shows that the 50% of the hosts are available for less than 0.07. -The study of these characteristics would help in designing a peer-to-peer system. Cons: -Their experimental runs are done only for a period of 7 days. Measurement and Modeling of Large-scale Overlay This paper does a performance analysis of Amazon EC2 , S3 amd SQS cluster services using a series of end-to-end throughput measurements of S3 from various points on the internet. The following were the findings -EC2 delivers ready-to-go virtual machines at reasonable cost -Delivers Amazon’s claimed level of throughput for larger data transfers, -The system is not able to provide peak throughput for smaller transactions due to transaction overhead -Availability of the system is really good. -Effective bandwidth varies heavily based on geographical location -Consecutive requests receive similar performance results Pros: -These experimental results reveal a lot of information about the Amazon EC2 cluster. This helps in tuning the application deployed on these systems. From: Simon Krueger Sent: Tuesday, April 12, 2011 11:27 AM To: Gupta, Indranil Subject: 525 review 04/12 Understanding availability, R. Bhagwan et al, IPTPS 2003 Core idea of the paper: The core idea of this paper is to analyze availability in an overnet peer to peer file sharing network over 7 days. Specifically, they remove misconceptions about previous availability in peer to peer systems. They specify exactly what availability is in a peer to peer system. They show that availability can be determined by the time of day. And they measure the rates of churn in overnet. Pros: They find that churn in the system is high Cons: Thoughts on how the paper can be furthered developed: Any questions, criticisms, thoughts, doubts, or wanderings: Is 7 days enough to really analzye and make conclusion about the system? What is the behavior of the system over a whole year? Measurement, modeling, and analysis of a peer-to-peer file-sharing workload, Krishna P. Gummadi et al, SOSP 2003 The core idea of this paper is to measure Kazaa traffic, measure the affect of system parameters, and how locality affects the system. Pros: Found that documents are fetch once and does not follow Zipf’s law Used a real life kazaa trace They found user characteristics: users are patient, users slow down as they age The found object characteristics: there is no one workload, objects are fetch once, popularity of objects are short lived, most popular are for new objects, and most requests are for old objects Cons: Large code base to measure traces, 30K lines Thoughts on how the paper can be furthered developed: Any questions, criticisms, thoughts, doubts, or wanderings: Is analyzing this type of traffic legal? Does the behavor in UW’s network describe the whole system’s behavior? What does the behavior look like over time as more users are switching to bittorrent or legally purchasing media? Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007 Core idea of the paper: They measure pplive’s overlay characteristics to help plan resource allocation and design future p2p systems. Additionally, they have a mathematical model to describe pplive’s population size and session length. Pros: Measure a real life p2p system found that ppllive overlays have random graph structures found that the average degree of a peer is independent of the channel’s population size. found that some pplive peeer pairs have highly correlated availability while others have no correlation Unlike p2p file sharing users, PPLive peers are impatient Session lengths are geometrically distributed channel population size in pplive are larger than p2p file sharing networks and can be fitted with polynomal mathematical models Cons: Thoughts on how the paper can be furthered developed: Any questions, criticisms, thoughts, doubts, or wanderings: Is it really enough to make statements contrasting other p2p systems when there is only a small number of measurements performed compared to the scale, activity, and history of the system. An Evaluation of Amazon’s Grid Computing Services: EC2, S3, and SQS, Simson Garfinkel, Harvard TechRep The core idea of the paper is to measure the through put and latency of Amazon’s Simple Storage Service (S3) from Amazon’s EC2 cluster and other places around the world. Pros: They measured from Hardvard, MIT, Los Angles, Pittsburgh, and the Netherlands. They found an unexplained drop in system performance They found that amazon delivers on it’s service guarantees when data transfers are 16MB or larger Cons: Thoughts on how the paper can be furthered developed: Any questions, criticisms, thoughts, doubts, or wanderings: I wonder what these measurements should show today? Since from what I have heard the work load of amazon’s web services has drastically changed over the recent years. Specifically, they have had many more users on there system. It would be interesting if these experiments could be performed yearly and published in some sort of report. Simon From: Anupam Das Sent: Tuesday, April 12, 2011 11:23 AM To: Gupta, Indranil Subject: 525 review 04/12 i. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload This paper presents a measurement and analysis of a P2P file sharing system called Kazaa. The authors performed an extensive 200–day trace of over 20 terabytes of Kazaa P2P traffic. They provide some interesting facts about the characteristics of Kazaa objects and users. The most important finding is that the behavior of P2P traffic is significantly different from that of web (WWW) traffic. The interesting findings regarding user characteristics in Kazaa include-- users are more patient (waiting even weeks to download a file), user activity slows down as they age i.e., they either use the system less often or demand less data when they use it and lastly, user remain active for very short fraction of the time. The paper also highlights some of the object characteristics of Kazaa which include-- objects are fetched at most once (since files are generally large and immutable files are requested once 94% of the time), popular objects are short lived (popular objects likes audio/video are frequently updated with new releases which means the older ones are no longer popular), popular objects tend be new born objects (this is trivial for multi-media files as people tend to download new releases) and most importantly Kazaa traffic does not follow zipf distribution. The authors also highlight that there is significant locality in Kazaa workload and therefore, they propose a locality aware model to reduce bandwidth consumption. They basically propose two options for utilizing locality- one is caching the popular files in the boundary of network and the other is redirecting the out-going request to some in-network node given that the file resides inside the network. Pros: 1. The paper analyzes a huge amount of trace data. 2. The paper highlights important user and object characteristics of Kazaa network. 3. The paper was published in 2003 when multi-media services were just emerging, so this paper provides insights into some of the important characteristics regarding multi-media workload. Cons: 1. The traces were collected from a university campus. So these workloads were basically generated by faculty members and students and thus they do not represent general workload. 2. By generating the desired requests some of the traces could have been biased towards the findings. Though the findings in this paper were appropriate in 2003, but now the characteristic of P2P file sharing systems have changed. So, it would be interesting to see the change in the current P2P networks. ----------------------------------------------------------------------------------------------------------------------------------- ii. An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS This technical report evaluates the performance, availability and usability of Amazon’s three web based services known as AWS (Amazon web services). AWS includes Amazon Elastic Compute Cloud (EC2) which allows users to initiate virtual instances to meet their computing demand. Simple Storage Service (S3) allows users to store large amounts of data and Simple Queue Service (SQS) is used to provide reliable messaging service which facilitates coordination during large-scale computations. Amazon provides three forms of interface to AWS – web based dashboard, REST (http based API) and SOAP (for remote procedure calls). The author looked into the security and licensing policy of Amazon and found many drawbacks like- there are no SLAs and thus making no guarantees to the clients. The author found the claims made by Amazon regarding availability and ease of use to be appropriate. Some interesting facts came out of the experiments conducted by the author. For example, in S3 the data transfer throughput is influenced by the size of the object being accesses. The author also tested throughput from different parts of the world and found throughput to vary which questions Amazon’s claim of using multiple data centers to improve performance. Moreover, it was found that the overhead associated with transferring small objects is quite significant. Finally, the author shared his experience about conducting his research projects using AWS. The author was quite satisfied with the availability and performance of AWS, but was frustrated with the lack of design specification which prevented him from tuning his experiments. Pros: 1. The paper provides helpful insights about the three services provided by Amazon. 2. The paper performs extensive experiments on different aspects. 3. AWS provides services with no startup cost. Cons (of AWS): 1. AWS provides no SLA. 2. Amazon’s WSLA allows Amazon to terminate service at anytime. This poses a potential threat to customer satisfaction, but gives Amazon legal flexibility. 3. Amazon provides a very weak security policy. It leaves the responsibility to the users to authenticate their account and data. 4. AWS provides no snapshot or versioning service of user data. It the client’s responsibility to do backup. 5. S3 does not provide any rename or move command. Only provides the primitive GET, PUT and DELETE commands. 6. Lack of C/C++ example code. All of Amazon’s code is in java, perl, python and Ruby. 7. Lack of design specification. Amazon is reluctant to release any information regarding the design internals of EC2, S3 and SOS. -----Anupam From: Michael Ford Sent: Tuesday, April 12, 2011 10:59 AM To: Gupta, Indranil Subject: 525 review 04/12 Understanding Availability This paper tries to understand availability in the setting of peer-to-peer networks. Traditional models of availability are designed around networks with an ideal node uptime of 100%. Peer-to-peer networks are affected by their user's behavior patterns. The results show that in a network with high churn, accurate measurement depends significantly on correctly identifying unique users, despite IP address changes. These problems certainly will not disappear with the growth of mobile computing and the possibility of multiple IP addresses within a single session as opposed to only switching between sessions. The results also indicate that the number of peers is highly dependent on time. There are certain peak hours when users are online, but during non-peak hours, availability can be affected. The author use a simple crawler to gain knowledge of the peers in the network and a prober to test availability. The crawler runs every four hours and the prober every 20 minutes. I am not convinced that this shows the true turnover rates in a peer-to-peer system. Moreover, their test system had less than 600 nodes online at any given time. That network size is small enough that advanced membership are not required, and the complexities of availability of peer-to-peer systems may not be uncovered. Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming Many peer-to-peer systems have similar architectures in terms of structure, connectivity, etc. However, some application specific systems do follow these patterns. One notable example is PPLive. The results of measurements on the system show that small channels look like random graphs, fan-out is independent of channel size, and users are impatient. These attributes are in strong contrast to traditional file-sharing peer-to-peer networks, where fan-out is dependent on size to provide performance properties, and users are willing to wait for large file downloads. The authors do not mention the size of the channels in comparison to traditional, large peer-to-peer systems. The fact is that each channel is semi-independent of other channels, and only contains a few thousand nodes. Each channel could be compared to a single peer-to-peer system, but they face none of the same scalability issues due to the differences in scale. The measurements themselves are carried out in more frequently than in some papers, and this leads to more interesting results, including being able to fit a geometric curve to the data. From: Tony Huang Sent: Tuesday, April 12, 2011 10:56 AM To: Gupta, Indranil Subject: 525 review 04/12 Paper: Understanding availability Core idea: This paper focuses on testing node availability in a p2p file sharing system, Overnet, over a 7 day period. The authors choose Overnet because the clients identify themselves through a unique client ID instead of IP addresses, which could pose the problem of host aliasing due to DHCP. It deploys crawler and prober to measure the system. The purpose of crawler is to collect a snapshot od the IDs of the active hosts at a particular point in time. The prober is to periodically probe a set of hosts to determine if they are available. The experiment shows the following result: * Host ID address aliasing is a significant issue in deployed p2p system. If we do not consider the IP aliasing aspect, we would underestimate host availability. * Host availability changes significantly with the length of day, and, as the measurement interval increases, host availability decreases. * The number of available hosts decreases during the time of day. * Hosts availability display only limited degree of interdependence. * Host join and depart activity, i.e host turnover, is high. Thoughts: * The paper general confirms a lot of common beliefs about p2p network. * What is the cause of those mobility, time-of-day attrition, etc.? A user profile or common usage pattern study would be very a very interesting study. * Does the conclusion and measurement acquired in this paper applied to a more general network or a commercial network? Paper: An evauluation of Amazon's Grid Computing Services: Core idea: This paper presents a study on Amazon's various commercial services, including S1, EC2 and SQS. EC2 is a commercial elastic computing services, where people can rent CPU and VM time to run their own applocations. IT provides dash-board based interface, REST API and RPC based API. Security is not strictly enforced across the infrastructure and users are advised to impose its own security measurement. AWS provides one more more external ID for a data center which is advertised in Amazon DNS. DNS resolution should not be cached for a long period. AMazon's SSS stores data as named objects grouped in named 'buckets'. The API is limited. Especially, it does not support renaming of files. SQS is a queue-like service. However, it does not provide ordering and does not display FIFO behavior. Evaluation of AWS draws the following conclusion: * The bandwidth is consistent, but it suffers from slow-down during certain time of day or internal architecture reconfiguration. The author even describe the amazon service as a testimony of impact of badly tuned TCP stack. * Amazon uses DNS for both load balancing and re-route traffic purpose. * Moving to larger transaction would only improve the performance initially, but little gain is obtained for transactions larger than 100MB in size. * Writes performs slower than read. This may because write has to reach two machines while read only need to read one machine. * Write has a larger band-width than read. This may come from the fact that write is acknowledged when the data is written to disk controller. * Time taken for each query can have significant variance. * Concurrent access using multiple threads would affect the performance of each thread, but the overall throughput would increase. The decreasing performance is probably due to processor contention on virtual machine. * Observed failure rate is low. -- Regards -- Tony From: Shen LI Sent: Tuesday, April 12, 2011 10:51 AM To: Gupta, Indranil Subject: 525 review 04/12 Name: Shen Li Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming This paper presents some measurement results and the corresponding mathematical models they derived. They use insert a crawler into the PPLive network and periodically takes snapshot of the whole overlay network. Their crawler runs parallel on multiple machines to increase the coverage of their snapshot. Among their findings, some are really interesting and surprising to me. They mentioned that the session lengths are typically geometrically distributed, which deviates a little bit from my intuition. They also provide a simple polynomial mathematical model for the channel population size variation. Pros: 1. The parallel crawler can not only save bandwidth in the network but also provide better snapshot for the measurement. According to Comet (the paper we discussed in class previously), one node, perhaps, may not be able to talk with other remote node directly. Some P2P protocol just banned this functionality for safety reasons. Thus, doing the experiment on one single node may lead to unrealistic measurement result. Do it in parallel on multiple nodes will be better. Questions: According to my understanding, what files host on the crawler may also affect the behaviour of the very node. It may influence the number of nodes trying to connect to it. Is this right? If yes, this paper omit this dimension. An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS This paper present a good number of measurements based on Amazon's services including EC2, S3, and SQS. They signed up for Amazon's EC2 'beta' program to conduct their experiments. Some of their results are not that interesting to me, such as they conclude that EC2 delivers on its promise of providing ready to go virtual machines, and Amazon services have excellent availability. We actually can tell this from the high popularity of the use of these systems. One interesting thing is that, they mentioned that between 10% to 20% of all queries will suffer decreased performance which is 5 times slower than the mean or worse. It is very useful for applications holding above Amazon's services. As they said, those applications and simply kill the slow instance and restart a new one. Pros: They provide a good number of experiments regarding the performance of Amazon's several web services. Cons: 1. Too little analysis regarding their results and comments. 2. Their experiments testing the S3 seems weak. Typically, applications holding above S3 will see many simultaneous access and updates. So they may also care about how S3 performs when facing a huge number of clients access at the same time. From: muntasir.raihan SpamElide on behalf of muntasir raihan rahman Sent: Tuesday, April 12, 2011 10:20 AM To: Gupta, Indranil Subject: 525 review 04/12 Understanding Reliability Summary: This paper is concerned with empirical characterization of availability in P2P systems. It shows that host availability performs quite differently from hardware and disk availability and is dependent on the time of day. The traditional assumption in distributed systems that transient failures should be ignored and only long term failures need to be handled does not hold for P2P systems. P2P host availability also depends on user churn. The measurement infrastructure in the paper consists of a crawler that provides a global view of host membership, and a prober that returns fine-grain information on host availability. The authors show that probing by IP address overestimates the number of hosts and underestimates availability due to DHCP and NAT boxes. As a result the measurement study uses random IDs to calculate availability which avoids IP address aliasing problems. The temporal effect experiments reveal a diurnal pattern of availability. The paper also uses a simple experiment of comparing P(Y=1|X=1) and P(Y=1) to show that the interdependence of the availability of two hosts X and Y is very low. In summary the authors show that IP address aliasing can bias the results of P2P availability studies. Also availability can no longer be modeled as a single parameter, rather it is a combination of short term and long term churn. Pros: (1) The paper proposes very simple measurement techniques that give rise to novel and interesting observations about the availability of P2P systems. (2) It shows that P2P host availability is different from disk and hardware availability. (3) Sheds light on the effect of IP address aliasing on availability studies. Cons: (1) Is this measurement study representative of current P2P systems like bittorrent? (2) The paper does not seem to have justification of some of the parameter values. For example, why did they choose to start the crawler with 50 ids. Maybe using a random number instead of 50 would make the results more convincing. Future Works: (1) A probabilistic analysis of availability could better complement the proposed empirical analysis. Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming Summary: This paper presents a rigorous measurement and modeling study of the large scale p2p overlay graphs spanned by the PPLive system. Unlike previous studies that focused on either user-centric or network-centric measurements, this paper focuses on overlay-centric characteristics which is a complex combination of user and network centric features. The study reveals a number of interesting features of streaming overlays that are markedly different from traditional p2p file sharing systems. Specifically, the authors show that (1) small overlays resemble random graphs, (2) average peer degree is independent of channel population size, (3) availability correlation between nodes is bi-modal, (4) PPLive peers are impatient, (5) session lengths are geometrically distributed, and (6) channel population sizes are larger for PPLive than for p2p file sharing networks and can be fitted to a high degree polynomial equation. Pros: (1) The justification of using PPLive is explained well in the paper. (2) Makes an important observation that PPLive overlays may resemble random graphs. (3) Shows empirical formulas for channel length and channel population size. (4) The study reveals that small PPLive overlays are better off choosing neighbors at random. It also shows that protocols that treat all nodes equally can improve performance. (5) It shows that P2P system design performance depends heavily on the nature of the application. Future Works: (1) This study is for streaming overlays. Can the measurement techniques be reused for other types of overlays? -- Muntasir From: Andrew Harris Sent: Tuesday, April 12, 2011 6:23 AM To: Gupta, Indranil Subject: 525 review 04/12 Review of “Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming”, Vu et al, and “Understanding Availability”, Bhagwan et al The first paper offers an analysis of PPLive, a p2p video distribution network. The group shows how PPLive differs from more traditional p2p networks, sharing only the impatience of users between them. Using a crawler framework, the group reveals that PPLive networks are rather randomized in their connectivity, compared to self-assembled networks in traditional p2p. They assert that nodes sharing common interests were more likely to have higher content availabilities between them, perhaps an intuitive conclusion given the context of common media interests. They note that while the population of a given media channel does fluctuate wildly over the day, all channels see their peak populations at noon and in the evening. Finally, on the impatience of users, they note that most users stay on only for short videos (50% for sessions shorter than 10 minutes), which is consistent with users browsing for leisure or during their lunch break. In analyzing PPLive, it seems the group here may have overlooked a key conclusion from the correlation of availabilities across snapshot groups. A high availability for content on a node would imply that a user is a) actively engaged with that content, and b) actively able to rebroadcast that content. This becomes the observed usage pattern for nodes within the same snapshot - they share viewing interests and thus are better matched for one another because they can retransmit this shared content. Consider though that this has a third implication: that a user is actively allowing content to be retransmitted from their node. PPLive users have no overt legal incentive to throttle themselves from retransmitting content, but they may have a bandwidth usage or quality of service incentive. This is especially the case for any user on a slow or shared connection, where upstream bandwidth will be severely limited. The success of PPLive seems to depend on a high number of available nodes, each with some available quantity of bandwidth to rebroadcast data. The authors also stress that usage patterns of p2p networks should be assessed per platform, rather than as a whole, as they can vary depending on the content involved. It would seem then that effects of bandwidth throttling should be considered within and across snapshots, and added to this section about in-snapshot usage. The Overnet study is an attempt to capture the availability of nodes in a more traditional (albeit DHT-based) p2p file sharing network, with the overall research motivation of the group being p2p storage. This group also developed a crawler, which served to examine the network structure of a given time snapshot in Overnet, as well as a prober, which periodically tested these discovered connections for their availability. It was seen that host masking and aliasing was a problem worth exploring for p2p systems, in that while only 1468 hosts were seen, 5867 IPs were observed as corresponding to these hosts, a 1-to-4 relation. There was also a clear diurnal pattern observed, as peaks in usage could be seen across multiple days. Finally, churn and node turnover in the network was examined, with completely new hosts comprising 20% of all hosts on any given day. Similar to the study of PPLive, in studying Overnet, the group here only examined the availability of nodes over varying time lengths. This also overlooks the varying effects of bandwidth availability for each node. A common-enough problem in p2p networks is just this, although many attempt to gauge and take into account the measured bandwidth of each client. A more descriptive model of node availability would have associated with it some conversion between bandwidth availability and actual availability. Consider that, for small text files a small bandwidth would be acceptable, but for files like Linux ISOs a slow connection may take a user many days to complete a transfer. As such, the impact of bandwidth would also need to be framed relative to the common types of files being transmitted across the p2p network involved. It may seem pointless for large-pipe networks to consider bandwidth in their analyses, however consumers of p2p technology are increasingly gaining reason to monitor their bandwidth usage. The most prevalent reasons seem to be provider caps (and overage penalties), and fear of legal reprisal for sharing illegal content. The latter is only applicable to Overnet, however bandwidth caps are relevant to any high-bitrate data source (video, files, etc.). Users then have an incentive to reduce their broadband availability, thus influencing (as above) their rebroadcasting patterns and actual node availability. These seem to be nontrivial influences! It seems strange in both studies that bandwidth effects were relatively overlooked.From: nicholas.nj.jordan SpamElide on behalf of Nicholas Jordan Sent: Tuesday, April 12, 2011 3:39 AM To: Gupta, Indranil Subject: 525 review 04/12 Attachments: 4_12.docx Here it is -- Thanks, Nick Jordan From: lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng Sent: Tuesday, April 12, 2011 12:08 AM To: indy SpamElide Subject: 525 Review 04/12 CS 525 - Review: 04/12 Measurement Studies Measurement, modeling, and analysis of a peer-to-peer file-sharing workload Krishna P. Gummadi et al, SOSP 2003 The paper analyzed the workload of one peer-to-peer (P2P) file-sharing software, Kazaa, by implementing a crawler to collect user data. Most files shared in Kazaa are multimedia files, so there are two main characteristics different from traditional HTML contents of WWW traffic: immutability and “fetch-at-most-once” of user behavior. The trace data collected by the crawler was consistent with these two properties. Based on the data, the paper proposed a model of file-sharing workloads. The first contribution of the paper was to collect data in a long period of time (200-day trace), which was much longer than previous works and thus provided more insights on user long-term behavior. This allowed them to draw conclusion on metrics correlated with ages. For example, user’s request byte decreases over time, file’s popularity dies out soon, new files are tend to be more popular, and more users request for old files (, which are the files that exist for more than 30 days in the paper’s definition). The second contribution was that the paper found out the discrepancy between Zipf law and P2P multimedia workload, where Kazaa’s trace had a more flattened head. The paper then developed a model for large files (, which are usually video files that are more than 100 MB). The model not only took fetch-at-most-once behavior into account, but also consider how new files are assigned popularity rank according to Zipf distribution. With suitable choice of parameters, the paper showed that the model prediction was very close to the data they collected. The final contribution was the study of cache strategy to better exploit locality and thus to decrease the workload. One significant finding was that 86% of workload for externally downloaded bytes can be avoided by using a perfect proxy. While this is not a practical deployment, the paper also showed that both locality-aware protocol and redirector can save such bandwidth significantly, as well. Comments/questions/critiques: I am quite surprised that older clients request for less data during each usage of system, since there are always new contents released. Perhaps, in older days, new contents’ birth rate was not so high, so older clients had already downloaded much of the files they were interested in and waited for new files. Otherwise, I do not have any intuitive reason for this observation. While their model seems convincing, I am wondering whether it still provides good prediction today. In particular, I am dubious about the way they assign popularity for new-born files. In the model, the popularity is assigned randomly based on Zipf(1) distribution. However, I claim that there are some correlations among new-born files. For example, when a new movie comes to market and has a great buzz, every user wants to download the movie. In today’s market, the movie usually comes with many other side-products, such as games, albums or even TV interviews or shows. Popularity of all such files should be correlated with each other, since users interested in the movie might have a great chance to download these side-products. It is not clear whether this would affect the model’s precision. Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming, L. Vu, I. Gupta, J. Liang, K. Nahrstedt, QShine 2007 The paper first noticed that there was insufficiency in previous measurement papers on multimedia streaming. The previous papers lacked of intensive study on overlay-based characteristics. In particular, the paper tried to answer how does the resulting overlay graph constructed by PPLive, a popular Chinese IPTV, look like given different parameters, such as channel population size. The first contribution was the crawler that incurred lower load on PPLive than previous crawler does due to more flexible termination strategy. Moreover, the crawler estimates channel size more accurately by adopting more general definition of active nodes. Second, the paper found out that unlike P2P file-sharing system, the overlay formed by P2P streaming system is more like random graph as indicated by clustering coefficient. Moreover, the peers have correlated availability in the same snapshot (i.e., among the peers that has been polled in the same data-collecting run of the crawler). It is possibly because of the result of PPLive’s inter-overlay optimization and common interested among users in the same snapshot. Third, the paper used a discrete mathematical series to model the PDF of session length and a 9th degree polynomial to model the channel population size. Finally, based on the finding in the paper, some practical suggestions were presented. In particular, the simple and homogeneous protocol should work well for P2P streaming service is quite interesting. The paper argued that such solution would work due to random graph-like overlay and each node’s memoryless session length. Comments/questions/critiques: The term “patience” was kind of abused in the paper. One is about lack of interest in content and the other about intolerance of start-up delay. While reading paper, I was confused first, because for me, patience is more related to the second meaning. Moreover, the conclusion on session length and channel characteristics and user preference do not convince me. The explanation in the paper was not the only interpretation of the data. For example, user might just switch frequently at first in order to find one channel she likes most, and this behavior is not related to channel characteristics (I assume in the paper, this means delay and quality of the program). The models are not very intuitive, and wouldn’t they be overfitting? From: Qingxi Li Sent: Monday, April 11, 2011 11:39 PM To: Gupta, Indranil Subject: 525 review 04/12 Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming This paper measures the overlay-based characteristics of PPLive. Some of these characteristics are caused by the protocol of PPLive and the others are caused by the television application. It finds out that some PPLive overlay network are random graphs and the average degree of a peer in the overlay is independent of the channel’s population. I think this two finds are based on the detail overlay protocol of the PPLive. Besides this, the authors also find that the people are impatient which means that some peers leave in a short time. This situation also happens in television. Before finding the channel the person really wants, s/he will continually switch the channel. And this may also cause by the start-up delay is longer than the people expectation. So people cannot wait this channel to start up any more and switch to another one. The population of the channel varies larger than the file sharing p2p system and this variation is a function of time. As PPLive is a network television application, unlike the file sharing p2p systems. For the file sharing systems, people can just put it background and keep on working or go away from the computer. So the session time of them is always long. However, PPLive can only be used when people is in front of the computer, so the population will dependent on the time and from the result we can find that at noon and in the evening the population will be much larger than the other time. The correlation between PPLive peer pairs is either highly correlated availability or no correlation. The high correlation may caused by the same scheduler of the users, like the two persons in one company, as the rest time and working time are the same, the time they use PPLive will also be the same. Besides this, it’s also possible that the two peers are on the same computers by one person as the people don’t want to wait the stream loading. Just as the authors mention in the paper, the overlay characteristics are lie in between the user-centric view and the network-centric view. The user-centric view will be caused by the characteristic of the application and the network-centric view will be caused by the protocol. Understanding Availability This paper measures the Overnet file sharing network peers availability. The authors choose this p2p file system just because it suppose the session ID which will give to the peer when the peer join the network and destroy when the peer leave. With the help of the session ID, the availability can easy be measured and the previous work based on the IP address which will be affected by the DHCP and the same peer will be counted multiple times. To measure the availability in the system, the crawler repeatedly requests 50 random ID and these requests will discover new peers. Then the crawler request the 50 ID to the new finding peers to find the other peers. This process will be done once 4 hours. Besides this, every 20 minutes the process will randomly choose some ID to test whether this Ides are still in the network. From the result, the DHCP makes the previous measurement result nearly 4 times large. And there is no correlation of the nodes. I wonder whether 20 minutes is too long to detect whether the peers is still in the system. Besides this, as the subset is chosen randomly, there exists the possibility that the nodes which are not in the system never been detected. From: Tengfei Mu Sent: Monday, April 11, 2011 8:22 PM To: Gupta, Indranil Subject: 525 review 04/12 1. Understanding Availability This paper discusses the concept of availability in the Overnet peer-to-peer file sharing systems. Overnet is widely deployed DHT-based peer-to-peer network and users could be tracked by ID instead of their IP address in the Overnet. The authors have used crawler to get a snapshot of the active peers in the system at a particular time and prober to decide whether particular subset of nodes crawled by crawler are alive at a particular time. Then aliasing effects, time of day effects, node availability interdependence and arrival and departure of nodes issues are studied. They found that peer-to-peer host availability depends on several factors above instead of just one single parameter. Pro: 1. The finding is meaning for future availability analysis. Con: 1. The paper only study the Overnet behavior. We don’t know whether the same argument could hold for other P2P systems. 2. The authors did trace experiments only for 7-15 days, imposing the possibility of loss some long-term pattern findings. 2. An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS This report investigates Amazon Web Services (AWS) based on the analysis of Amazon’s security model. AWS is leading in the cloud-computing field. The authors introduce EC2, S3 and SQS briefly, and explains what kind of model they are using to provide their service and the corresponding pricing policy. This includes aspects such as security means, virtualization based technologies and even legal issues. The paper also presents their experience of using AWS, and evaluations for them. Pro: AWS is sold without minimum pricing very fine granularity. Con: There is no service level agreement From: david.m.lundgren SpamElide on behalf of David Lundgren Sent: Monday, April 11, 2011 11:54 AM To: Gupta, Indranil Subject: 525 review 04/12 Understanding Availability Understanding Availability is one of those simple, yet disruptive papers that come along every now and then and challenge the methodology and assumptions of a field. Bhagwan et al. verify the independence of node a's availability on node b's and that node availability follows a diurnal pattern; the authors show that the parametrization of node availability of [0,1] is overly simplistic; the rate of host churn in P2P systems is examined; and previous studies are shown to underestimate host availability due to IP aliasing. Global snapshots of Overnet are obtained using a ``crawler'' which recursively samples system state six times a day. Smaller, more exact analysis of node subsets is carried out using a prober that heartbeats nodes three times hourly. Pros, Cons, Comments, and Questions: - The authors' methodology for examining P2P host availability without relying on IP addresses as a unique identifier is rigorous and the scope (and cost) of such IP aliasing errors is shown to be extremely high and the scope (and cost) of such IP aliasing errors is shown to be extremely high. - The generalization of their results to arbitrary peer-to-peer systems is implicitly assumed, but there is little evidence given for this assumption. Time-of-day effects, host availability interdependence, and churn generalizability are suspect. - Overnet's size seems smaller than the other, more popular P2P networks discussed in class. Also, the authors only discuss small window snapshots of Overnet (on the order of days). Future work could examine the paper's conclusions at a larger time scale. ------------------------------------------------------------------------- Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Using a 200-day trace of Kazaa traffic from University of Washington's network, Gummadi et al. characterize the system workload as the amalgam of discrete workloads based on object size. This multimedia workload is contrasted with traditional web workloads, and is shown to exhibit unique characteristics resulting from its distinguishing feature of fetch-at-most-once (FAMO) requests. The authors show that such multimedia workloads do not follow the ubiquitous Zipf distribution due to the FAMO nature of requests generated (requests of this type result from the immutability of media objects). A model for multimedia workloads is introduced and is shown to provide a good approximation for the Kazaa data. The model's general fit (or perhaps, more appropriately, Zipf's ``disfit'') to other multimedia workloads is shown. The feasibility of locality-aware optimizations for a Kazaa like network are discussed and upper bounds of system efficiency improvement are calculated. Pros, Cons, Comments, and Questions: - The authors' model of P2P file-sharing workloads using a fetch-at-most-once mechanism accurately models the non-Zipf distribution of Kazaa's workload. The accuracy of the model's fit is impressive, lending strong support to the claim of multimedia objects' immutability being the driving force for the differing web and media workloads. - The observation that new object arrival improves performance (and that client arrivals do not counteract the decrease in hit-rate resulting from client aging) is non-intuitive. Because newly introduced objects help to rejuvenate cache-hits, an object-birth-sensitive cache improve a system's hit rate. - While the authors discuss measuring the popularity of objects over time, I think a more detailed analysis is in order. Such an analysis could lead to smarter cache insertions and more efficient data replication. From: Ankit Singla Sent: Monday, April 11, 2011 8:22 AM To: Gupta, Indranil Subject: 525 Review 04/12 1. Understanding Availability -------------------------- Summary: This paper contrasts the notion of availability in a centrally managed system like a cluster against availability in a p2p system in the face of node-churn. Availability in the former is fairly well known because of fairly reliable components with somewhat known failure characteristics (e.g. disk drive failures etc.) and certain assumptions on independence of failures in components. In p2p systems however, with human players often being involved, availability varies with time as participants enter and leave at will. Moreover, departure of clients from the system can be highly correlated (based on time-of-day correlations for instance). To study availability, they experiment on the Overnet file-sharing system. Comments: It surprises me as to how many IP addresses are used by the same hosts (more than 10 different IPs for 12% hosts). I guess this is a significant result in that measurements based on IPs underestimate availability. Their measurements of host availability versus time are also interesting, leading to the conclusion that the measurement interval must also be specified with any numbers on availability. 2. Amazon Evaluation -------------------- Summary: The paper evaluates Amazon's cloud services quantitatively, and to some extent for ease-of-use too. The experiments and analysis are fairly extensive over a period of more than half a year. It's interesting that they saw some service characteristics changing as Amazon made modifications to their systems (e.g. Fig. 1 - throughput falls significantly on April 1.) Their overall conclusion is that Amazon's offerings fairly match up to the advertised level. For short transfers, the storage service many deliver much lower throughput though (is this just a TCP inefficiency?) Comments: Another interesting question around migrating an application to the cloud is how reliability is affected. For instance, if multiple replicas end up being placed on different VMs on the same physical machine, reliability of the application definitely suffers. While the paper critiques Amazon's weaknesses on security, privacy and reliability, I think this helps them keep costs down for the general customer-base. Not everyone requires top-notch guarantees on these things, so making them a fundamental part of the architecture is probably a bad business idea, as appealing as it might be academically. (Maybe an end-to-end argument applies -- end users take care of these functions themselves because anyway security, etc. has to be ensured at end-points too.) There are some interesting legal issues discussed in the paper including libel and sexual orientation based discrimination using Amazon's infrastructure. Ankit -------------------------------------------------------------------- Ankit Singla Graduate Student, Computer Science University of Illinois at Urbana-Champaign (UIUC) http://www.cs.illinois.edu/homes/singla2/ From: Chi-Yao Hong Sent: Sunday, April 10, 2011 9:53 AM To: Gupta, Indranil Subject: 525 review 4/12 ---- Understanding availability, IPTPS’03 ---- This paper studied a fundamental measurement problem about the characteristics of host availability. The authors measured Overnet, a peer-to-peer file sharing system, which allows them to identify the host by its overlay id instead of the IP address. As difference between the availability of an IP address and that of an actual user ID is reported. The major finding is that many user changes its IP address quickly so IP-address-based availability measurement might provide biased results. Here are some other take-aways. 1. The distribution of host availability changes rapidly with the measurement period. 2. A strong diurnal pattern is observed for host availability. 3. The availabilities of all hosts are strongly dependent on each other. 4. Overnet has a high host turnover rate, which also has significant impacts to the system availability. Critiques: A two-week measurement might be too short. Some periodicities cannot be seen from this short-term measurement, e.g., weekly pattern. Also, I argue that the availability is very application dependent, and Overnet is representative for only file sharing peer-to-peer instead of other p2p systems. ---- Measurement, modeling, and analysis of a peer-to-peer file-sharing workload, SOSP’03 ---- This paper studied the peer-to-peer file-sharing workload. There are three main goals achieved by this paper. First, it analyzed the network characteristics of peer-to-peer file-sharing traffic. This allowed authors to compare the P2P traffic with web traffic, which has distinct differences. The author then showed that the Kazza popularity function is substantially deviated from Zipf. Second, the author showed that the main workload in Kazza is driven by the creation of new objects and the addition of new client. This is unlike Web, whose workload is driven by the change of documents. Third, the authors analyzed the potential performance of location-aware p2p mechanisms. Pros: 1. Presentation is clear, and the measurement is technically sound. The motivation is strong, and the results are interesting. This is textbook-quality content which I would suggest people to read if they want to work on network measurement. 2. The inferences are plausible – Immutability and “fetch-at-most-once” are the key factors that affect the p2p traffic. Cons: 1. A long-tailed CDF figure of download latency does not necessary mean that users are patient as the partial requests are filtered out. 2. Except for web traffic and VoD, It would be more interesting to compare the p2p traffic load with other on-demand systems such as IPTV and VoIP/video conferencing. 3. While there is a clear bandwidth benefit to use location-aware schemes, the authors do not consider the control overheads for enabling location-aware services. -- Chi-Yao Hong Computer Science, Illinois http://cyhong.projects.cs.illinois.edu