From: gh.hosseinabadi@gmail.com on behalf of Ghazale Hosseinabadi [ghossei2@illinois.edu] Sent: Tuesday, March 16, 2010 12:07 PM To: Gupta, Indranil Subject: 525 review 03/16 paper 1: AnySee: Peer-to-Peer Live Streaming This paper addresses the problem of efficient and scalable live-streaming overlay construction. The main performance metrics in the design of live-streaming overlay are: startup delay, source-to-end delay, and playback continuity. The authors explain that intra-overlay optimization when multiple streaming overlays are present is not the best possible solution. Instead, they discuss that higher efficiency is achieved if the inter-overlay optimization is used. They design a scheme, called AnySee that optimizes resources in an inter-overlay manner. The main design issues they considered are as follows: 1. find paths with low delays (source-to-end delay, startup delay) 2. maintain the service continuity and stability 3. find the best frequency of optimizations 4. reduce the control overhead caused by the algorithm AnySee works through the following steps: 1. An efficient mesh-based overlay is constructed. 2. A location detector based algorithm is employed to match the overlay with the underlying physical topology. 3. An entity called “single overlay manager”, resolves the issues caused by join/leave of the peers. 4. An entity called “inter-overlay optimization manager” finds the best paths, builds backup links and removes paths with low QoS, in an inter-overlay manner. 5. An entity called “Key node manager” allocates the resources 6. Buffer manager manages and schedules the transmission of the media data. The detailed description of the operation of each entity is presented in the paper. Finally, the performance of AnySee is evaluated through simulations. Pros: 1. The paper considers the fact that intra-overlay optimization is not always the most optimal solution for live-streaming overlay construction, in the case where multiple streams are present in the network. 2. The authors designed the whole protocol for live-streaming overlay construction. The construction is composed of different entities. Functionality of each entity and the interaction of different entities are well designed. 3. An optimization framework for resource utilization is developed. Cons: 1. It is not clear why the overlay is constructed as a mesh. No argument on the reasons for choosing a mesh is presented. Other topologies might improve the performance. 2. No theoretical analysis on the performance of reverse tracing algorithm is presented. Since it is performed on the whole topology, it generates high overhead (for example in terms of message complexity). 3. The trade-off between the performance of AnySee and its message complexity is not presented. Maybe in some network topologies, it is better not to perform the inter-overlay optimization. Paper 2: Corona: A High Performance Publish-Subscribe System for the World Wide Web This paper considers publish-subscribe mechanism for delivering updates to users quickly and efficiently. The updates happen in an asynchronous manner. Some existing works propose the solutions based on naďve polling. But this method has the problem of wasting the resources of the network, e.g. the bandwidth. The proposed solution in this paper is called Corona, which is a decentralized scheme. The authors formulated the problem of tradeoff between bandwidth and update latency as an optimization framework. The solution to this optimization problem is then found in a distributed manner. Corona builds an infrastructure by setting an overlay composed of geographically distributed nodes. Corona uses cooperative polling to implement the updates fast and to reduce the overhead. The number of nodes performing polls is determined considering the tradeoff between update performance and network load. Achieving different performance goals in Corona are formulated as optimization problems. In other words, different versions of Corona (Corona-Lite, Corona-Fast, Corona-Fair, Corona-Fair-Sqrt, Corona-Fair-Log) are presented, each achieves a different performance objective. Corona uses Honeycomb optimization toolkit for solving these optimization problems in a distributed manner. Performance of different versions of Corona are evaluated through simulations. Pros: 1. Cooperative polling is used to increase the efficiency and decrease the unnecessary overhead. 2. An optimization framework for performance-overhead tradeoff is formulated. 3. Different objectives can be achieved using the same optimization method and tools. Cons: 1. No analysis on the amount of overhead introduced by constructing Corona cloud or performing different system management operations is presented. In some scenarios all these might add high overhead to the task of delivering updates. 2. Time needed for the distributed solution of the optimization problem to converge to the optimal point might be long in some cases. And this time adds delay to informing users about the updates. 3. The distributed methods for solving optimization problems do not always converge to the global optimal. It is not clear if this case might happen in Corona or not. From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Tuesday, March 16, 2010 11:26 AM To: Gupta, Indranil Subject: 525 review 03/16 525 review 03/16 Hyun Duk Kim (hkim277) * Anysee: p2p live streaming, X. Liao et al, Infocom 2006. This paper introduces a new peer-to-peer live streaming system, AnySee. Efficient and scalable live-streaming overlay construction is a hot issue. While existing solutions focused on intra-overlay optimization, AnySee proposes an inter-overlay optimization scheme which allows to join resources in multiple overlay and constructs efficient paths using peers in different overlays. Authors tested the proposed system with simulation and actual release to users. According to the simulation and user usage analysis, AnySee improves global resource utilization and decrease delays. This paper has actual large scale system usage analysis. While most of paper evaluates their system with simulation or small amount of lab experiment results, AnySee was released to end users, and authors analyzed how it was used by users. AnySee was released very successfully and even adopted by 60000 people. There is no other better way to prove how good the system is practically. The large number of user also shows its scalability well. Through the usage analysis, in addition to the basic performance analysis, they could find out interesting user behaviors like the tolerable delay time. It could be better to suggest more future directions obtained from the user usage. This paper mainly presents positive aspects of AnySee. Are not there any short comings of AnySee? Especially, because they released their system to end users and adopted by a large number of users, there should be some feedback from users. It could be better if they can explain some interesting feedbacks from users. * Corona: a high-performance publish-subscribe for the World Wide Web, V. Ramasubramaniam et al, Usenix NSDI 2006. This paper presents Corona, a new publish-subscribe system for the World Wide Web called. Web lacks a good publish-subscribe interface. Existing naive polling based updates has poor performance, limited scalability and high server loads. Authors suggest Corona which provides high performance, scalability and general/backwards-compatibility using optimal resource allocation. In Corona, nodes in Corona cloud are divided, check update of assigned servers and share information each other. They formalize the trade-offs between performance and resource usage as an optimization problem and solve it with several approximation solutions. They showed evaluation results from simulation and actual deployment on PlanetLab. Authors suggest several variations of algorithms optimized by different goals. Because the problem formalized by optimization framework is NP hard. Authors suggest various approximation algorithms; Corona-Lite, Corona-Fair, Corona-Fast, Corona-Fair-Sqrt and Corona-Fair-Log. Users can use one of the first three algorithms depending on the performance goals and improve performance of Corona-Fair with the last two variations. There is no explanation of initial performance variation of algorithms. Most of evaluation graphs of algorithms have suspicious movements near y-axis. For example, network load on content server (Fig 3) was very low at the beginning, peak at 2 hours and stabilize after that. Is it because algorithm needs time to analysis environment information and adjusts its parameters? If there was clear explanation about them, it could be better. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: Vivek [vivek112@gmail.com] Sent: Tuesday, March 16, 2010 11:08 AM To: indy@cs.uiuc.edu Subject: 525 review 3/16 Corona: Core Idea: Corona addresses the issue of content dissemination over various sources. The particular use case is the micronews feeds, where there may be many subscribers to the feed and frequent updates. As the paper states, the main goal they after is the optimal balance of getting benefits out of aggregate bandwidth and maintaining low update latencies. Their approach is to models the tradeoff as a mathematical optimization problem. The result is a polling schedule for different objects that will acheive global performance goals, subject to resource constraints. The Corona system is comprised of a "cloud of nodes " which is used to monitor a set of webpages or feeds. Experimentation is done on PlanetLab nodes, using 150,000 subscriptions and 7500 different feeds. Pros: - Aside from the fact that I felt that the content in this paper was very interesting to read, the paper really goes into detail on incorporating a huge number of different factors involved in publish-subscribe system. Mathematical modeling is often criticized(somewhat unfairly, in my opinion) because it does not consider real-world factors involved. However, the authors in this paper take care in incorporating in their optimization methodology object popularity, update rate , content size, internal system overhead -- rather than just aggregate bandwidth and update latency. - Simple front-end is used. It is simply a IM service. Because of this, it is easy to realize how one might use Corona. - Different performance goals are considered: corona-lite, corona-fast, corona-fair, corona-fair-sqrt, corona-fair-log. This strengthens the paper significantly, because the authors are comprehensive in considering many different application needs in a real-world system -- thus making Corona much more applicable. Cons: - The experimentation is done only on PlanetLab nodes. This is indeed a reputable system for experimentation. However, it seems that the studies are not as strong by doing experimentation on just one testbed. Does it make sense to consider another testbed such as Emulab? - Is it appropriate to use a Zipf distribution as an an approximation of feed/channel popularity. Specifically, there doesn't seem to as much analysis, proof, or perhaps citation that this is reasonable. Are there other choices of probability distributions ? Why Zipf? and why exponent of .5? (Indeed, it's possible I might have missed something in the paper, but I guess it's not clear whether such a choice is sound. SplitStream: Core Idea: SplitStream addresses the issue of tree -based mulit-cast for particularly in the realm of decentralized peer-to-peer systems. The main idea behind splitstream is to divide content into k "stripes" and then multi-cast east stripe by using an independent tree. Peers specify upper-bound of stripes they are willing to forward. Pros: - Design considers balancing forwarding across nodes. Some of this may be attributed to link stress, as discussed early in the paper. - Faults are carefully anallyzed in the latter half of the paper, the theoretical proofs correspond well to the experimental results Cons: - The paper lays out many data structures, graph algorithms and proofs that these are correct. The work seems to be discussed in a theorietical manner. One issue though, is that after a very algorithmic approach, the authors still use experimental simulations, and don't vary as many parameters as the paper on Corona does. If this were done, a more clear understanding of the technique could be captured. From: Giang Nguyen [nguyen59@illinois.edu] Sent: Tuesday, March 16, 2010 10:52 AM To: Gupta, Indranil Subject: 525 review 03/16 CS525 review nguyen59 03/16/10 Corona: A High Performance Publish-Subscribe System for the World Wide Web Frequently updated websites provide a mechanism (eg RSS feeds) that clients can use to stay up-to-date of the sites changes. Existing techniques involve clients' naively independently polling the website. Popular websites have to handle high load and thus enforce rate limiting. This paper presents a novel decentralized system for detecting and disseminating. Each target URL is called a channel, which is hashed to compute its overlay key. A node in the DHT is responsible for polling the target channel its nodeId has L prefix bits matching the channel's key's L prefix bits, where L is the desired configurable (polling) "level." Intuitively, the smaller L is, the higher the number of DHT nodes match it, and thus these nodes will as group get updates quicker. These nodes, called "owners" of the channel, are responsible for maintaining clients' subscriptions to the channel. This is implemented using an Instant Messaging interface, where these owner nodes are similar to the IM bots of popular IM services. The users send subscribe/unsubscribe instant messages to these owners/IM bots, specifying the target channels. Users will get updates for the target channels asynchronously as IM messages, sent by the channel owners. The paper presents several distributed optimization algorithms to meets global system goals. Pros: - Able to support better updates latencies for clients while not increasing load on the content publishers. - Not require changes to content publishers or network infrastructure. Cons: - Reliance on IM service for control/dissemination of updates between subscribers and channel owners. - IM is not the most natural way to deal with web content. The browser is. It would be good if there was small usability study of the IM service as the control/dissemination mechanism. -------------------------------------------------------------- AnySee: Peer-to-Peer Live Streaming Internet live streaming (eg, video) is popular. To do that efficiently, the consumers of the stream are constructed into a multicast tree overlay, to reduce load on the producer as well as reduce network traffic. This paper presents a case for inter-overlay optimization because separately optimizing individual streaming overlay misses opportunities to optimize whole system. The approach here is that a node can join multiple overlays, and the system will use an inter-overlay optimization scheme to improve global resource utilization and distribute traffic evenly to all physical links, assign nodes based on their locality and delay, guarantee streaming service quality by using the nearest peers, and balance the load among group members. The nodes first form a mesh-based overlay and a locality algorithm runs to match the overlay with the underlying physical topology. Within individual overlays, there is a "single overlay manager" as in traditional intra-overlay optimization that deals with node churn. Then, the "inter-overlay optimization manager" explores appropriate paths, builds backup links, and cuts off paths with low QoS for each end peer. Finally, the "key node manager" allocates the limited resources, and the "buffer manager" manages and schedules the transmission of media data. The locality algorithm uses ping-like message latencies to infer locality among logical neighbours, then the node will select among logical neighbours with the same hop counts the one with the lowest ping latency as its new neighbour. Because each node be part of multiple overlays, it can receive media contents from multiple providers, allowing the inter-overlay optimization to select the best path to receive content and use the remaining ones as backup paths. When there are multiple providers, Because of the inter-overlap optimization, nodes can use smaller buffers, thus reducing startup delay. Cons: - Requires the nodes in the overlay to have synchronized clocks. From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Tuesday, March 16, 2010 9:26 AM To: Gupta, Indranil Subject: 525 review 03/16 AnySee : Peer to Peer Live Streaming ----------------------------------------------- Summary ------------- This paper presents AnySee, a live peer to peer streaming system. It exploits inter-overlay optimization in order to have a globally optimal solution and high resource utilization. A mesh-based overlay manager is responsible for optimizing the overlay, finding latest neighbours and eliminating slow connections. A single-overlay manager managers the peers leaving and joining operations. Another inter-overlay manager maintains backup links and eliminates paths with low QOS for each end peer. The inter-overlay manager uses probing to do this. A key-node manager is responsible for allocating resources and connections to a new request. Lastly, a buffer manager manages and schedules the transmission of media data to continuously keep the media playback. They also optimize the system in terms of user behaviour. For this, not do they only give importance to the average delay but also the percentage of leaving peers. When too many users leave the overlay, the system interprets that they system is heavily burdened and needs to be optimized. Pros ------ -- The overlay construction takes into account important metrics such as delay and user patience -- The system has been used over the real internet and has been tried and tested. The success of these transmissions speaks in favour of the ideas used here. Cons -------- -- In using inter-overlay optimization, the system uses other source nodes as relay nodes. Source nodes, however, will already be burdened with the responsibility to route its own streams to other nodes. In presence of other non-source nodes, it is not clear whether this approach is the best one to follow. -- The authors assume that all nodes will be able to provide the required incoming and outgoing bandwidth and do not consider this as a constraint. However, depending on the stream size, bandwidth can become a severe hindrance. -- From the statistical data collected, authors reach the conclusion that users have great patience for live streaming services with large delay if they have enough interest in the programs. However, if another system is developed that gives better response time, it will be difficult to convince users to use AnySee without changing important policies of the system such as the use of ADL. SplitStream: High- Bandwidth Multicast in Co-operative Environments ---------------------------------------------------------------------------------------- Summary ------------ This paper presents SplitStream, a multicast protocol that is built on top of Pastry and Scribe. The basic assumption of the method here is that in conventional tree based multicast system, the interior nodes of a tree are overloaded, whereas the bandwidth and CPU cycles of the leaves in the multicast tree are not fully utilized. As such, SplitStream strives to build a forest of interior-node-disjoint trees while satisfying the bandwidth constraints of the nodes. A set of trees is said to be interior-node disjoint if each node is an interior node in at most one tree and a leaf node on the other trees. Pastry normally forwards a message towards nodes whose nodeIds share progressively longer prefixes with the messages key. Since a scribe tree is formed by the routes from all members to groupIds, the nodeIds of all interior nodes share some number of digits with the trees groupId. Hence, the authors ensure that the Scribe tree have a disjoint set of interior nodes simply by choosing groupIds for trees that all differ in the most significant bit. Splitstream exploits properties of Pastry routing to achieve this. In addition, it stripes each stream across k stripes, each of which has their own multicast tree. Through the use of data encodings like erasure coding, Splitstream ensures that even if a small subset of nodes or links fails, a node can reconstruct the stream from the other stripes that it receives. Pros ------- -- Many streams in multimedia are redundant. Splitstream exploits such redundancy to make the system robust. -- They try and build an efficient forest while staying within specified bandwidth constraints. Cons ------- -- While constructing trees, they only consider bandwidth as a constraint neglecting delay altogether. However, in multimedia streaming, delay and jitter are important factors to ensure QOS guarantees. Although they show that around 90% of their packets tolerate a delay less than a second, the delay spikes introduced by 10% of the packets may inhibit the use of SplitStream in interactive media applications. -- They recommend the use of encodings such as erasure encoding. However, encoding and decoding such streams add processing delay to the streams. -- A stream can be reconstructed only when a subset of the k stripes is received. The delay between receiving such stripes will also be an important factor in determining the quality of the content that the user receives. Thanks, Kurchi Subhra Hazra Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Tuesday, March 16, 2010 8:51 AM To: Gupta, Indranil Subject: 525 review 03/16 Paper 1: Corona: A High Performance Publish-Subscribe System for the World Wide Web Traditional publish-subscribe methods for reporting updates to interested subscribers like micronews syndication tools are based on the nave approach of repeated pooling. These approaches suffer from poor performance and scalability since due to the limit posed by polling period, subscribers do not receive updates quickly and thus submit more requests which results in performing large number of polling for the same content. Corona is a fully decentralized system for detecting and disseminating web page updates and conducts a coordinated polling. Corona enables users to subscribe for updates to any existing web page, and delivers updates effectively in an asynchronous way without requiring changes to current web architecture. It benefits from a distributed, p2p, cooperative resource management framework that determines the optimal number of resources to devote to polling data channels in order to meet system-wide goals. In addition, it does not impose a high load on content servers more than what the subscribers do when they do not use Corona and respects the limits of the servers. What Corona does is to maximize the effective benefit of the aggregate bandwidth available to the system while remaining within server-imposed bandwidth limits. It expresses the problem as a mathematical optimization problem and solves it in a decentralized manner. Corona works on top of Pastry which is a distributed p2p overlay. In Corona, each node dedicated to monitoring a channel has a copy of the latest version of the channel contents and using a feed-specific difference engine determines whether detected changes are germane by filtering out superficial differences. Then it reports the differences to all internal nodes assigned to monitor the channel. Compared to preceding publish-subscribe systems that are not compatible with the current web architecture, Corona does not require changes to the existing infrastructure, such as web servers. The way it benefits from solving the optimization problem in a decentralized manner is interesting. Their extensive simulations and live deployment demonstrate the practicality of their system. Utilizing structured overlay systems, they provide decentralization, good failure resilience, and high scalability. Corona uses instant messaging as its user interface which enables it to be easily accessible to a large user population. One question though is what happens if an earlier update arrives at primary owner after a more fresh update. It sounds that the primary owner assigns a higher timestamp to the stale one and it gets propagated if there is no assumption on time skew. Paper 2: AnySee: Peer-to-Peer Live Streaming Traditionally live media streaming has been performed using IP multicast, however due to practical issues of routers, it was not widely deployed. Recent efforts in this direction have concentrated on building efficient streaming overlay multicast scheme based on P2P networks. These schemes focus on improving startup delay, source-to-end delay, and playback continuity. To this end most works have focused on intra-overlay optimization in which each node can join at most one overlay structure. This way, these approaches do not benefit from globally near-optimal features and therefore suffer from long delays and unplanned interruptions especially when a large number of peers join the network simultaneously. AnySee tries to improve the overlay construction and resource utilization of traditional approaches with regard to global optimality. They propose an inter-overlay optimization in which peers can join multiple overlays simultaneously. The building components of AnySee are as follow. The Mesh-based Overlay Manager which constructs a mesh-based overlay and uses a location detector based algorithm to match the constructed overlay with the underlying physical topology; the Single Overlay Manager that is based on traditional intra-overlay optimization and deals with peers join/leave operations; the Inter-overlay Optimization Manager which explores appropriate paths, builds backup links, and cut off low quality paths; the Key Node Manager allocates the limited resources; the Buffer Manager manages and schedules the transmission of data. Anysee improves global resource utilization of a P2P live streaming network and distributes traffic to all physical links evenly. It benefits from a locality- and delay-aware resource allocation scheme. No matter which overlay peers belong to, AnySee utilizes nearest peers to improve QoS. Compared to most previous approaches that are tree-based, AnySee utilizes meshes and hence balances the load among the group members. AnySee has also been implemented which presents more accurate results in real loads compared to simulation; the experiments assert the scalability and robustness of the scheme having acceptable source-to-end delay, resource utilization, and startup delay characteristics. From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Tuesday, March 16, 2010 6:28 AM To: Gupta, Indranil Subject: 525 review 03/16 Review of “Anysee: p2p live streaming”: The paper deals with the construction of Anysee, a P2P system overlay for live streaming which deals with various problems such as inter-overlay optimization, global resource usage optimization, load balance across various nodes. The Anysee system essentially has 5 important components: Every node initially joins the mesh-based overlay manager, with the help of which it can determine its neighbors while trying to optimize the latency form each of them. A node then joins a single overlay which is managed by the single overlay manager. The inter-overlay optimization manager is used by a node for finding back-up paths in the event of failure of its primary path: a fixed threshold number of paths of back-up paths are maintained by each node. This is done by the use of a reverse trace mechanism in which a node probes sources within a fixed threshold latency from it and uses them to find the best path to a source. It uses this heuristic to build incrementally better paths for various nodes. The system further has a key node manager which takes care of the admission control at each node: it determines which requests are to be granted based on a local queuing model optimization at each node. The paper presents several trace-driven simulation results which measure the increase in resource utilization while using Anysee as compared to coolstreaming. Pros: - Anysee particularly focuses on choosing low latency neighbors and tries to choose paths using inter-overlay optimization by using path delay as the metric. This ensures that the start-up time is low and a optimizes the system from the perspective of latency. - The admission control at each node ensures that the resource usage at a particular node is optimal. Further, each node uses a buffer management mechanism which ensures that locality of the requests is taken into account. Comments: - The paper promises that Anysee would optimize resource management. However, the mechanism is never evaluated for the optimality it achieves. The simulations just compare Anysee to an earlier protocol but do not compare it with the most optimal mechanism of doing things. While, the mechanisms presented in the paper try to perform local optimizations, this does not necessarily ensure a global optimal to be achieved. - Anysee uses delay/latency as the metric of performance. These are measured by using timestamps on the packets sent. However, an efficient measurement of this mechanism required clocks at different nodes to be synchronized (need not be perfect). If the error in clock synchronization, is of the order of that of the latency values, then this type of measurement of the delay metric might indeed lead to an inefficient operation. - The paper does not evaluate the message overhead of the joins/churn in the system which can be significant. Further, the paper considers only latency as the metric for the performance of the streaming system. However, the latency is not a very big factor once the system has been set up: various other factors like bandwidth of the overlay path being used, congestion etc. play an important role in the performance of the system, which are completely ignored in this paper. Review of “Corona: a high-performance publish-subscribe for the World Wide Web”: The paper proposes, Corona, a publish-subscribe architecture of the current Web which inter-operates with existing servers providing backward compatibility. The main goals that this achieves are to decrease the latency of detecting updates, high performance, optimal resource allocation and scalability, all of which have been traditionally ignored in systems which as micronews-syndication. Corona operates as an intermediary between the users who actually subscribe to various Websites for updates and services and the actual servers which provide these services. Users subscribe to the system through IM and Corona operates as an overlay network which receives such subscriptions and probes the servers on the users’ behalf. Different nodes in the Corona overlay are responsible for different channels to be polled and each channel is polled by multiple overlay nodes at a low frequency which allows the overlay system to detect changes/updates at a greater rate than a traditional reader which polls at the same rate. The paper proposes various optimization problems each of which tries to optimize a different resource (detection time), while ensuring that certain constraints are satisfied (load on servers). The paper gives results of trace-based simulations which show that Corona can reduce the detection time by a large factor. Pros: - The paper proposes an overlay system which can act as an intermediate between users who subscribe to various websites and websites which publish various information which is of interest to various users. Such a mechanism helps in increasing the scalability of the system as each user need not directly contact each server independently: this helps to decrease the overall web traffic and also decrease the amount of bandwidth used at each user and at each webserver. - The paper proposes several different optimization problems each aiming to optimize different aspects of the system as a whole. This allows the operator to choose a different optimization mechanism in an application/system specific manner. - Experimental results suggest that the detection time can be reduce by an order of magnitude as compared to traditional mechanisms while imposing the same load as them. Comments: - The experiments suggest that the convergence time of the distributed optimization mechanism used by Corona is large (several 10s of minutes to a few hours). In realistic scenarios where there is high churn in the system, it might happen that the optimization mechanism never converges because of changes in the channels subscribed etc. - Corona does not take the resource constraints at the overlay nodes into account. These nodes are typically shared machines (PlanetLab nodes as in the paper) which would encounter resource limitations in practice. The optimization framework presented does not take this into account. This should also take into account the number nodes available which can be used as overlay nodes since this would effect the distribution of channels which are to monitored and depend on the number of users/channels that are subscribed to (which arises because of the fact that these nodes are resource constrained). 525 - The paper does not deal with failures of the overlay nodes: what happens if an overlay (multiple) node(s) fail? The subscription to several channels might be decreased and the updates would be propagated slowly. -- Virajith Jalaparti PhD Student, Computer Science University of Illinois at Urbana-Champaign Web: http://www.cs.illinois.edu/homes/jalapar1/ From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Tuesday, March 16, 2010 1:49 AM To: Gupta, Indranil Subject: 525 review 03/16 Anysee: p2p live streaming, X. Liao et al, Infocom 2006. The authors describe the operation of a commercially deployed P2P media streaming system called Anysee. The distinguishing feature of Anysee is that it implements cross-overlay optimization -- when multiple overlays share common nodes, traffic intended for one overlay is sometimes routed through another if the alternate overlay is not fully loaded. The authors describe the overlay update and maintenance algorithms in detail and present results from an experimental evaluation, as well as a simulation based comparison with the Coolstreaming system. Comments: * Instead of the inter-overlay optimization scheme, can you just merge all the overlays together into one giant network using network coding at the common nodes? * Some of the design choices are unjustified. Example: why push in inter-overlay paths only indirectly, through the set of backup paths, instead of using a more direct/active approach? * The evaluation seems unsatisfactory. I'm not sure what other choices apart from Coolstreaming were available when this paper was written, but they could have also tried to show a comparison with the theoretically perfect algorithm (a powerful centralized controller that has access to information about the entire system and can control all the nodes' actions). This would indicate how far away they were from the maximum achievable performance level. Corona: a high-performance publish-subscribe for the World Wide Web, V. Ramasubramaniam et al, Usenix NSDI 2006. The authors present Corona, a publish-subscribe system for the web. As opposed to the existing WWW, in which each client polls each server/resource independently, in the Corona system the update polling is carried out by a cluster of nodes which operate collaboratively. Any client requiring update notifications subscribes to this cluster, which sends out the notifications, as and when they are seen, using Instant Messaging. The authors describe the behaviour of the polling cluster in terms of an optimization program whose constraints can be adjusted depending on the desired outcome (e.g. minimum latency vs minimum server load). They also describe and evaluate an experimental deployment which uses Pastry to maintain the polling cluster substrate. Comments: * The Corona system is not exactly a shining example of simplicity. Considering how important the problem they try to solve here is, would it be better to just start an effort for developing a push-based system (with a pull-based backup operating for nodes behind firewalls or NATs)? This is in essence what they're doing here: interfacing the push-based IM frontend with a pull-based backend layer. * The effect of a Corona system would be significant only with a large deployment size. That is, at the scales they're looking at, the Corona servers would have to be a service provided by an ISP (to the client) or a CDN (to the server). But both these entities already handle data caching and replication for efficient data transfer (CDNs: by definition, ISPs: use proxies). Do we really need a separate pub-sub layer, or can e.g. a CDN handle RSS feed accesses by itself? From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Tuesday, March 16, 2010 12:07 AM To: Gupta, Indranil Subject: 525 review 03/16 CS525 reviewed on Epidemics Liangliang Cao (cao4@illinois.edu) March 16, 2010 Paper 1: M. Castro et al, Splitstream, SOSP 2003. In tree-based multicast systems, a relatively small number of interior nodes carry the load of forwarding multicast messages, which poses a problem for application-level multicast in peer-to-peer systems. To address this problem, SplitStream proposed to stripe the content across a forest of interior-node-disjoint multicast trees. The challenge is to construct this forest of multicast trees such that an interior node in one tree is a leaf node in all the remaining trees and the bandwidth constraints specified by the nodes are satisfied. SplitStream relies on Ratnasamy’s SCAN to construct and maintain these trees. The results show that SplitStream distributes the forwarding load among all peers and can accommodate peers with different bandwidth capacities while imposing low overhead for forest construction and maintenance. Pros: • SplitStream generalized the conventional tree-based multicast by enabling efficient cooperative distribution of high-bandwidth content. This improvement is important for the peer-to-peer systems scenarios when with all peers are expected to share the forwarding load. • SplitStream is highly robust to failures because a node failure causes the loss of a single stripe on average. • The experimental results on an Internet testbed and via large-scale network simulation look convincing. Cons • Although SplitStream enables all the peers to share the forwarding cost, it is not an optimized way to utilizing all the resources. Some spare bandwidth might be wasted in the scheme. • It is possible to improve SplitStream further by optimize the global overlaying scheme. We could design better multi-casting method by distribute the traffic to all physical links evenly. Paper 2: X. Liao et al, Anysee: p2p live streaming, INFOCOM2006. This paper develops an efficient system for broadcasting peer-to-peer live streaming. The proposed system combines the strength of both tree-based overlay and mesh-based overlay, and designs a p2p paradigm with five managing blocks: mesh-based overlay, single overlay, inter-overlay optimization, buffer and key node manager. The proposed approach works well in real applications from which 60,000 users benefit from the AnySee system. Pros: • The proposed inter-overlay optimization together with buffer and key node manger alleviate the unplanned interruption caused by a large number of peers joining the network simultaneously. • The combined system is convincing in conducting the global optimization of resources while keeping the overhead within a reasonable range. Cons: • The experimental results are not very convincing. 60K users is not a large number especially considering the large population in China and the increasing popularity in TV, Movie, and other live-steaming data. It is interesting to see how AnySee works for larger scale society. • Although AnySee incorporates many previous works into the system, the design of the system is not very new. Considering the history of P2P system, the first stage is to develop brute-force method (Napster, Guntella), and then researchers present beautiful mathematical models (Chord, PASTRY, SCAN). I wonder whether the next stage is develop sophisticated system which combines the previous algorithms into building blocks? From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Monday, March 15, 2010 11:49 PM To: Gupta, Indranil Subject: 525 review 03/16 ================================================= Paper 1: AnySee: Peer-to-peer Live Streaming ================================================= The paper titled "AnySee: Peer-to-Peer Live Streaming" makes a good effort in trying to use inter-overlay optimization scheme, in which resources can join multiple overlays, to design and implement AnySee, a P2P live streaming system. AnySee outperforms previous schemes in several ways such as AnySee improves global resource utilization by evenly distributing traffic to all links; AnySee allocates resources based on their locality and delay; AnySee guarantees streaming service quality by using the nearest peers; and AnySee achieves load balancing among the group members. The authors also present some experimental results on comprehensive trace driven simulations which was conducted based on topologies from Gnutella P2P networks. More Interestingly, AnySee was implemented as an Internet based live streaming system, and was successfully released for several months. The authors claimed that, over 60,000 users enjoyed massive entertainment programs through AnySee. AnySee addresses several challenges of P2P live streaming such as finding path with low delays, maintaining the service continuity and stability, determining the frequency of optimization operations and reducing control overhead. To deal with these challenges, AnySee incorporates several components such as Mesh-based overlay manager, Single overlay manager, Inter-overlay Optimization Manager, Key node Manager, and Buffer Manager. The mesh-based overlay manager uses LTM (Location-aware Topology Matching) technique, to optimize the overlay, find the latest neighbors, and purge slow connection. Single overlay manager is nothing but a typical intra-overlay optimization. It handles the join/leave operations of peers. Inter-overlay optimization manager explores appropriate paths, builds backup links, and eliminates paths with low QoS for each end peer. Key node manager allocates the limited resources. Buffer manager manages and schedules the transmission of media data. Pros: 1. AnySee effectively utilizes the inter-overlay optimization in parallel with intra-overlay optimization. 2. AnySee is deployed in real-world. Cons: 1. According o figure 8, the maximum resource utilization performance is 60%. It means that, either the proposed inter-overlay optimization scheme is not optimal or there are some other issues which are not considered yet in AnySee. 2. Besides locality, can we choose any other metric for service quality such as number of overlays that a peer is part of? 3. How to deal with the nodes with selfish behavior? 4. What is the update procedure of mesh-based overlay? 5. Do all overlays stream the same video? If not, then why users will be interested to join other overlays, in which, they don't have interest. Joining with other overlays might consume more bandwidth. Discussion points: 1. Is AnySee applicable for on-demand streaming? 2. The authors described how to use backup paths from source to destination but didn't describe how to make a smooth transition from one streaming path to another when a path is down. Is it the reason not having 100% continuity index? 3. Perhaps, the definition of continuity index is not comprehensive. Currently, it is defined as the ratio of used connections to all connections. To me, it should also include, how much link capacity is used as opposed to the total link capacity. 4. I have a concern about the result of CoolStreaming shown in figure 8. To me, CoolStreaming should be agnostic to number of streaming overlays since it only follows intra-overlay optimization. Why does the result vary with the change of number of streaming overlays? ====================================================================================== Paper 2: Corona: A High Performance Publish-Subscribe System for the World Wide Web ====================================================================================== RSS feed is an important application now a days to get the regular updates of different web related services such as blogs, weather report, news, etc. However, current RSS approaches suffer from a major drawback of higher number of polling for detecting updates which consume unnecessary bandwidth and creates scalability issue. In the paper titled "Corona: A High Performance Publish-Subscribe System for the World Wide Web", the authors proposed a publish-subscribe system which serves the same purpose of RSS feed but in an efficient and scalable way. Prior researches proposed various publish-subscribe systems, which unfortunately, were not directly applicable to existing WWW architecture. Corona addresses that issue. Corona, a cloud of several nodes forming overlay network through Pastry, works as the middle layer between publishers (web servers) and subscriber (clients) to assist communication between publishers and subscribers. Corona exploits IM service where each client at first needs to register their interests via IM. This paper mainly make three major contributions such as it sketches the design of Corona, a publish-subscribe system that does not require any changes to content sources, it formalizes the tradeoffs between bandwidth allocation and frequency of polling as an optimization problem and provides a distributed numerical solution, and it presents results from large-scale simulations and measurements form PlanetLab deployment . Pros: 1. As opposed to traditional RSS, Corona reduces both update detection time (often by orders of magnitude) and polling load on the servers. 2. Corona is evaluated through real world deployment on PlanetLab. Cons/Discussion Points: 1. Privacy might be an issue in Corona since each user needs to expose his/her IM id to get the update from the web. 2. Is there any significant impact of high volume of update messages? 3. How does the actual deployment of Corona work in WWW? 4. How to modify current Corona system to make it applicable for content-based pub/sub system? - Shameem Ahmed From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Monday, March 15, 2010 1:54 PM To: Gupta, Indranil Subject: 525 review 03/16 SplitStream: High-Bandwidth Multicast in Cooperative Environments This paper presents the design and implementation of SplitStream: a solution to the problems faced by IP multicast using single tree structures. There are many issues with the use of single trees for multicast. Most importantly, nodes high up in the tree termed as internal nodes become more like single points of failures. Also, depending on the applications, e.g. streaming media, voice and video etc, the internal nodes can quickly run out resources and become bottlenecks whereas the leaves may not be using many of their resources. The objective of SplitStream is to overcome this imbalance of resource usage across different nodes in the tree. The idea is that a peer should not be an internal node in more than one tree. The content is split into multiple stripes and each stripe traverses a different tree. The application is responsible for striping the data such that each stripe consumes approximately the same bandwidth and data can be reconstructed from a any subset (of a sufficient size) of the stripes. A failed node can therefore cause the loss of no more than one stripe of content on average. Through simulations as well as experimental results, the authors show that SplitStream achieves the forward load balancing objective, it is highly robust to node failure, can accommodate peers with different available resources and finally, SplitStream achieves all this with very little maintenance overhead. The paper considers SplitStream built upon Scribe and Pastry but claims that it can also work with other overlay protocols. The authors present an algorithm for the case when a node receives a join request from a prospective child but it can’t adopt the child because it has already reached its outdegree limit. They say that the node should adopt this child and then reject or orphan one of its existing children. The orphaned child then has to look for a new parent. Can this be dealt with in some other way? Because it seems like each join may cause a lot of perturbation if the parent has already reached its outdegree limit. They talk about spare capacity group but even in that explanation, it seems like the orphan may replace another leaf and in turn creates another orphan and I am not convinced if this ping-pong kind of solution is the right way to address this. Also, with all of these solutions, failure to locate a parent is still not impossible, although they mention that it will not be a frequent occurence. Also, security of the system has not been looked at. Corona: A High Performance Publish-Subscribe System for the World Wide Web This paper presents ‘Corona’ which is a publish-subscribe system for the world wide web. Earlier solutions at providing updates to clients suffer from many problems. First off, requiring the clients to look for updates, motivates the clients to send many polling requests if they want to be able to find the updates as soon as possible. This may overload the servers and can be very suboptimal in the case that large numbers of clients start polling for the same updates by sending independent requests. Corona does not suffer from these problems. The main design feature is that Corona uses Cooperative polling where nodes periodically poll for the same information and if any of them receives an update, it shares the update with all other nodes. Corona finds out the optimal number (using Honeycomb) of nodes that should be polling for a specific topic or channel such that an acceptable tradeoff is achieved between the volume of polling messages and the update performance obtained. It is built on top of pastry on the pretext that Pastry is scalable and resilient to failures. Corona assigns Ids to both pastry nodes and channels from the same Id space. Nodes poll channels whose polling levels match with the nodes’ own Id in a specific number of digits. To further improve upon the update detection time, the authors propose Corona-Lite which also ensures that the content servers are only lightly loaded. But since Corona-Lite’s performance varies according to the workload under consideration, the authors next present Corona-Fast which provides stable update performance regardless of the workload and its variations. Finally, Corona-Fair tries to achieve a fair distribution of update performance between different channels. Is the strategy of nodes polling channels that match their Ids a good one? Can it be made topology aware for better performance in terms of routes taken and observed response latencies? Also, I am not sure how a polling level is selected for a channel? As for CoronaLite, the paper uses the average update detection time as the performance evaluation metric. Is average a very good metric here, because it masks many other important quantities, like the maximum or worst case update detection time, or the standard deviation that might give some idea of fairness to different clients rather than different content. Moreover, I am not sure how exactly is the update performance and update detection time calculated. What about the time that the update takes to propagate to each of the interested users? It cannot reach all the users at the same time. Can different guarantees be provided to different users for example based on some form of payment/tokens etc. As for the system management, the owners have many major responsibilities, like managing subscriptions, polling and also dealing with the received updates. Is there some incentive for some nodes to do all this on behalf of other nodes? Also, this makes the system very sensitive to the failure of an owner node. Also, the whole system can fail if more than a certain number of adjacent nodes fail simultaneously or are compromised by an attacker. Finally, they have used a polling interval of 30minutes and a maintenance interval of one hour but they have not provided any intuition for using these numbers nor have they motivated these numbers through any experiments. The same is the case for using honeycomb, no particular reason why it was chosen and how important is it for obtaining the results as they are. From: Nathan Dautenhahn [dautenh1@illinois.edu] Sent: Monday, March 15, 2010 10:08 AM To: Gupta, Indranil Subject: 525 review 03/16 ************************************** * Nathan Dautenhahn * * Paper Reviews March 16, 2010 * ************************************** ############################################################################### Title: AnySee: Peer-to-Peer Live Streaming Authors: Xiaofei Liao, Hai Jin, Yunhao Liu, Lionwl M. Ni, Dafu Deng ############################################################################### I. Summary and Overview AnySee is a P2P live streaming system that uses intra-overlay multicasting to disseminate a myriad of multimedia services. The primary problem with current systems is that although they provide improvement over classical systems, they suffer from non-globally optimized overlay construction and poor resource utilization. These issues manifest to the end user as startup delays, source-to-end delays, and playback discontinuity. AnySee attempts to offer: - Improved global resource utilization - Assigning resources based on location and delay - Upgraded quality using nearest peers for routing - Balancing the load among peers equally II. Questions, Comments, and Concerns This paper was well written and flowed well. They did a lot of really interesting things. I really like that they provided both an in depth simulation as well as a full implementation that is in a real world scenario. This is huge when it comes to the applicability of their design. My considerations are as follows: - It appears as though they are attempting to solve a MinPath problem, which if I remember correctly is an NP problem, or maybe even NPC. They are using simple heuristics to solve for the optimization, but would it benefit them to do some of the advanced algorithms analysis to find out the theoretical bounds of their solution? ############################################################################### Title: Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: Venugopalan Ramasubramanian, Ryan Peterson, and Emin Gun Sirer ############################################################################### I. Summary and Overview This paper discuss Corona, which is a publish-subscribe system for web updates, such as blogs, web page updates, etc. The problem of the current system lies in the asynchronous nature of web updates, and the current polling architecture. The response time to an update can take on average 15 minutes to update, while consuming a lot of bandwidth to perform polling. The bandwidth load is often taken by the publishing services. Corona proposes to solve this problem by using a distributed P2P resource management framework. Essentially they distribute the polling amongst a set of P2P nodes, and then push updates around internal to the nodes. The main contributions of Corona are: - A general publish-subscribe system that does not require any change to existing internet architecture or protocols. - Formal framework to manage the trade-offs of bandwidth and update latency, in a optimization problem. - Provides extensive simulation and live deployment. I really like the full formalization of the problem. It is a really huge feat, which allows for future use of the model for new optimization schemes. II. Questions, Comments, and Concerns My considerations are as follows: - Security vulnerabilities are quite extensive. - I wonder why they didn't create their own client, choosing to use IM messenger technology. It is extremely easy to create your own client that can do message passing and such. I assume that they are targeting an easy adoption platform (E.g., lots of people use AIM, gchat, etc so why not target the platform so its easy to adopt). The limitations introduced by this are somewhat of an interesting trade-off. - They do a really good job, but it seems like at the end of the work they increase the update time, but the bandwidth costs stay relatively the same. Is it worth all the extra work to go from 15 minute average to < 1 min? - I really bought the need/motivation for bandwidth utilization, but they haven't completely gotten rid of this. Do people really need updates quicker than 15 minutes? If they needed them this fast couldn't they just visit the content provider and check for the update themselves? Auto refresh on the browser? III. Common Threads All of the papers for today discuss issues related to information dissemination via overlay P2P networks. Corona is much different in terms of application than the others, but similar issues arise. It is interesting to note the affinity for advanced math in these topics as compared to other systems type problems. ::nathan::From: arod99@gmail.com on behalf of Wucherl Yoo [wyoo5@illinois.edu] Sent: Sunday, March 14, 2010 11:33 PM To: indy@cs.uiuc.edu Subject: 525 review 3/16 Publish-Subscribe/CDNs, Wucherl Yoo (wyoo5) Splitstream, M. Castro et al, SOSP 2003. Summary: Splitstream is an application-level multicast system using p2p network built on Pastry and Scribe. It provides high bandwidth data dissemination in a cooperative manner. It splits the content into multiple stripes and multicasts each stripe using independent multicast trees. Thus it is more resilient to node failures because of this striping that exploits path diversity. Some applications can get most benefits from SplitStream – multicating bulk data with erasure coding or video streaming with multiple description coding (MDC). Pastry routing can construct interior-node-disjoint trees by grouping the interior nodes by disjoint most significant bits. These interior-node-disjoint trees can guarantee the constraints of inbound bandwidth. To guarantee the constraints of outbound bandwidth, parent nodes try to have children within the outdegree limit. When prospective child tries to join and the parent node can adopts the join, then see whether it exceeds the outdegree limit. If so, the parent evicts a child (that becomes an orphan) with shortest prefix mach with stripeId, then try to find new parent for this node by DFS search in a spare capacity group. This finding ensures that the parent is near the orphan in the physical network. Pros: 1. Load balancing of multicast especially for bandwidth guarantees by ensuring the majority of node are interior nodes in only one multicast tree – previous tree-based approaches put pressures on interior node for forwarding messages Cons: 1. May increase latency of Pastry ring since SplitStream tries to replace ring topologies to construct interior-node-disjoint trees with bandwidth constraints and the outdegree limit without considering proximity metrics 2. Due to finding orphan in a recursive manner, SplitStream cannot limit the latency of node join. Worse problem is that the child node can be an orphan by newly joined node so the multicast service can be disrupted – normal expectation is the node joined for a while receives steady service. Corona: a high-performance publish-subscribe for the World Wide Web, V. Ramasubramaniam et al, Usenix NSDI 2006. Summary: Corona is a peer-to-peer publish-subscribe architecture layered on Pastry. It monitors subscripted web pages using channel abstractions from cooperative polling and disseminates the updates in a decentralized manner. Corona assigns a primary owner of channel and polling nodes reside in a partitioned wedge of the Pastry ring. The primary owner receives cooperative polling from the assigned nodes and disseminates the updates if exist. There is tradeoff between update latency and bandwidth usage depending on polling level (that is decided by polling internal and number of polling nodes). To optimize assignment of polling level is NP-hard problem and this problem is solved by decentralized numerical algorithm. Thus each channel can independently adjust polling level for better optimization. Pros: 1. Scalable decentralized publish-subscribe architecture – good usage of Pastry ring 2. Reasonable appliance of numerical algorithm (Honeycomb tool) for optimization that enables decentralized local optimization from independent channels. Cons: 1. Overhead of primary owner is greater than normal nodes – supernode concept can help (try to elect primary node with more computation power and bandwidth available) 2. Fixed polling interval eases optimization but cannot adapt heterogeneous characteristics of nodes and contents -Wucherl