From: yanenli SpamElide on behalf of Yanen Li [yanenli2 SpamElide] Sent: Tuesday, April 05, 2011 12:32 PM To: Gupta, Indranil Subject: cs 525 review 04/05/2011 Yanen Li paper 1: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining summary: This paper presented astrolabe, a distributed, self-organized information management system, which collects large scale system state, provides on-the-fly attribute aggregation and performs rapid updates. The Astrolabe system is essentially a relational database built using a peer-to-peer protocol running between the applications or computers on which Astrolabe is installed. Astrolabe monitors the dynamically changing state of a collection of distributed resources, reporting summaries of this information to its users. This information can be retrieved by queries written in SQL syntax. methods in Astrolabe: 1. Mobile code: Astrolabe uses SQL-like query language to enable different applications to monitor different kind of information or one application to query different data at different times. 2. Hierarchy: by using zone hierarchy, Astrolabe achieves scalability. It first summarizes information in zones and then exchanges information between zones. 3. Random peer to peer: Astrolabe runs a process in each host and let them communicate in a epidemically, peer-to-peer fashion. This avoids vulnerable and heavily loaded central monitor servers. 4. Certification: Astrolabe has its own public key infrastructure over the zone tree hierarchy. It prevents and controls propagation of bad data and costly operation. It also digitally signs the information exchanged between nodes and provided to users. Pros: 1. Astrolabe is largely scalable. It is reported that it can scale to thousands of nodes. 2. SQL-like query language provides an easy to use interface to monitor and manage information in the system 3. peer-to-peer Gossip communication protocol makes Astrolabe low latency compared to other DNS-like management systems. Questions: 1. Timestamps Astrolabe are assumed to be global – is this reasonable assumption? Paper 2: Moara: Flexible and Scalable Group-Based Querying System summary This paper presents Moara, a distributed querying system addressing the group-based one-time query problem over a large-scale distributed infrastructure. It presents a SQL-like query interface, and models queries as a tuple of (query-attribute, aggregation-function, predicate), where the predicate is expressed as a series conjunct/disjunct terms of the form (attribute op value). Moara knows how to deliver the query and aggregate the data through the member nodes. Through this design, bandwidth could be saved and resource with nodes could be saved also. This paper is very focused on the query optimization in Moara. Two important optimization techniques include using per-simple-predicate-based prune variables to help pruning subtrees and using a path-compression-like technique to jump over unnecessary nodes. pros: 1. The SQL-like query interface is simple and useful, which can satisfy most of the needs of administrative queries. 2. Focusing on the most important issues of query optimization. 3. Dynamic maintenance of groups. cons: 1. Moara is using pooling strategy to monitor the system states. However, notification strategy might be more suitable in the case for administrative purposes where continuous monitoring is more common than one time query. 2. The authors try to find the lowest cost simple group for composite predicates, but this solution is not theoretically optimal. -- Yanen Li Ph.D Candidate Department of  Computer Science University of Illinois at Urbana-Champaign Tel: (217)-766-0686 Email: yanenli2 SpamElide From: Igor Svecs [isvecs SpamElide] Sent: Tuesday, April 05, 2011 12:27 PM To: Gupta, Indranil Subject: 525 review 04/05 Igors Svecs CS 525 2011-04-05 Reliable On-Demand Management Operations for Large-scale Distributed Applications SUMMARY The paper introduces MON, a gossip-based protocol for building on-demand overlays for managing distributed applications. Each node maintains a list of log(N) members. When a query (such as "select where ") is issued, an overlay tree or a DAG is constructed by forwarding Session messages between the nodes. Experiments show that construction takes ~1.5 seconds on average on a 330-node PlanetLab slice. For their implementation, on-demand overlay construction is more feasible than persistent overlays if a query is sent at least every 40 seconds. Medium-term overlays are considered to increase query throughput; DAGs are used to add some redundancy in the overlay to survive failures. PROS - Does not require to have global membership information. - On-demand overlay construction makes RON simpler compared to persistent overlays. - With medium-term overlays, MON is useful for infrequent manual management tasks. CONS - Since MON is designed for a modest scale (several 100's to 1000's nodes) and assumes infrequent queries (on-demand tree construction), there appears to be no reason why each instance of the distributed application could not maintain a connection to a well-known management server (botnet-style). This should have been discussed in the paper. - Partial membership lists have to be maintained by each instance, and they must include at least log(N) entries. There is also an assumtion that these lists should have uniform random distribution, which may not work for each application, and may require additional overhead. - 1.5-2 s additional latency (tree construction) for each query may limit query rates. 40 second feasibility threshold seems too high for automated application monitoring, and is more suitable for manual administrative tasks. Moara: Flexible and Scalable Group-Based Querying System SUMMARY Moira is a system that builds and maintains group trees for subsequent querying. Each Moira node mainstains a list of (attribute, value) pairs. This information can be dynamic; e.g., (Current-CPU-Usage, 55). A Moira query is formatted as (query-attribute, aggregation function, group-predicate), e.g., (Current-CPU-Usage, Average, Service = Apache and Application = AppX). Group predicate is a boolean expression of the type (group-attribute op value). Moira uses Pastry DHT as an overlay, and maintains a DHT tree for routing. When a query is generated, group-attribute is hashed to a value, and the query is forwarded to a node that is the root of the corresponding group-attribute tree. This root then propagates the query to its children, does in-network aggregation, and returns the result to the client. To avoid flooding the query to all nodes, subtrees that do not satisfy the predicate are pruned - child nodes proactively notify their parent with their (attribute, value) pairs, and the parent can use this information to determine whether to forward the query down the subtree. PROS - Design of this system is driven by existing need. - The system considers both scalability and efficiency. Once the pruning mechanism is fine-tuned, it can be very efficient. CONS - Aggregation function is required to be partially aggregatable. - Group-attributes need to be more-or-less static. - Only eventual completeness is guaranteed. - Relatively complex system that deals with lots of dynamic structures. - A lot of manual fine-tuning and optimization is required for the pruning to work effectively. From: w SpamElide on behalf of Will Dietz [wdietz2 SpamElide] Sent: Tuesday, April 05, 2011 12:27 PM To: Gupta, Indranil Subject: cs525 Review 04/05 "Reliable On-Demand Management Operations for Large-Scale Distributed Applications" This paper presents "MON" (Management Overlay Network) system, that introduces a number of key ideas. MONS introduces the idea of 'on-demand' overlay networks,which the authors use to faciliate commands over a distributed system. On-demand overlay networks are probabilistic graph constructions (trees or DAGS, depending on what makes more sense for a particular query) that are generated for each query (idea being that there are relatively few queries, and not needing to preserve the overlay makes it much cheaper) and used to issue and gather the results of a particular control command. The commands they support over this system are avg/top/histo/grep/run and a more general 'select' to find resources/nodes that match certain criteria. One thought is that it doesn't seem clear why the limited command set--seems you could support quite a few more operations relatively easily. My best guess is that the main contribution is the membership and overlay construction ideas (and investigation of how effective/reliable/etc) they are, and that the authors leave command language construction to other works. They added MONs to PlanetLab, claiming it's "available". Does this mean it's actively in use on the PlanetLab cluster? From: harshitha.menon SpamElide on behalf of Harshitha Menon [gplkrsh2 SpamElide] Sent: Tuesday, April 05, 2011 12:17 PM To: Gupta, Indranil Subject: 525 review 04/05 MON: This paper proposes a mechanism for instant monitoring and management of tasks for large-scale distributed applications running on hundreds of hosts. The approach uses MON (Management Overlay Networks) , on-demand overlays, to support instant queries to be executed on hosts in the system. On-demand overlays are built on-the-fly by using gossip-style membership protocol. Cluster management tools allow querying of resources associated with the infrastructure but there is a scarcity of tools that application developers can use for managing their applications on a cluster. This system built on top of weakly consistent membership information can be used to execute monitoring and software push commands quickly, scalably and reliably. The system is built on two kinds of overlay structures: trees and DAGs. Pros: -Uses gossip-based membership management protocol which aims at eventual consistency. This means that even in case of failures, this system would perform reliably. -This system is scalable and latency is low for query execution. -This system is light-weight in terms of memory, computation and bandwidth -Provides tuning parameters for application-specific reliability Cons and Suggestions: -Construction of overlay tree and DAG structure would incur costs of construction and rebuilding when nodes join and leave -Experiments showing the performance during high churn rate is not shown. -max_drop and max_missing specified at overlay creation time doesnt seem to be the right metric as it might have different consequences based on the overlay structure. For eg: in tree failure of some node can lead to a branch being inaccessible but instead for DAG, thats not the case and might trigger false failure alarms. Moara: Flexible and Scalable Group-Based Querying system Moara is a querying system which builds aggregation trees for different groups and adaptively maintains the trees to optimize total message cost. Moara supports a query language allowing groups to be specified implicitly. A query in Moara has three components, query attribute, aggregation function and group predicate. Moara uses peer to peer in network aggregation approach and uses a DHT trees to compute results. These trees are used for spreading queries and aggregating answers back towards source node. Moara uses query optimization techniques. It uses adaptive pruning of branches of the DHT trees and only querying nodes that can reply. This causes reduction in bandwidth and reduces response time. Pros: -The query language is simple and handles composite query expressions -Dynamically maintains group trees based on query rates and churn rates which reduces the bandwidth consumption and response time -The simulations indicate that the churn doesnt affect the system Cons and Suggestions: -Moara uses DHT trees for group predicate and this would incur maintenance cost. So how about using DAG From: Jason Croft [croft1 SpamElide] Sent: Tuesday, April 05, 2011 12:17 PM To: Gupta, Indranil Subject: 525 review 04/05 Reliable On-Demand Management Operations for Large-scale Distributed Applications Management Overlay Networks (MON) arise out of a need for low cost management of end-user applications in distributed infrastructures. The design allows queries to be executed for large-scale distributed applications using weakly-consistent membership information and building one overlay per query. On-demand overlays are used in place of persistent overlays to reduce bandwidth overhead incurred by membership maintenance. Instead, a gossip-style protocol is used to distributed this information so nodes can build partial views, or partial membership lists. MON supports two different overlay structures: trees and DAGs, for instant status queries (trees) and software push (DAGs). Trees can be constructed using a random tree construction algorithm or a locality-aware two-stage construction. The two-stage method improves on the random tree, which is oblivious to the underlying network. DAGs are constructed using levels, where each node receives a level one more than that of its first parent. However, this design is limited by an upper bound on the rate of injected queries. Above a rate of one query every 40 seconds, MON is less effective than a persistent overlay in bandwidth usage. While this rate may be acceptable for operators/administrators, it would not work well for automated or continuous monitoring. Therefore, the authors also propose the use of medium-term overlays that last for hundreds of seconds. This can then be optimized for session reliability (that at any given time, no more than a specified number of nodes are disconnected) or task reliability (that at most a specified number of nodes will be missing from the aggregate result). For task reliability, experiments show perfect query completion 72% of the time, which may not be optimal for continuous monitoring of some metrics, but if this is constraint is relaxed to allow 3% missing results then queries can be completed 96% of the time. The latency of queries is only several seconds, with 97% completing in under 2s. However, this only considers the random tree and two stage overlays. It would have been interesting to see a similar evaluation with the medium-term overlay shell DAG to determine if there is a large change in latency. Moara: Flexible and Scalable Group-Based Querying System Moara provides a flexible, efficient, and scalable method to query groups of machines by preferring high availability and scalability over consistency. Moara uses aggregation trees to inject queries to a minimal number of nodes and reduce message cost and query response time. The query processor can support composite queries that target multiple groups. A query contains an attribute to be aggregated, an aggregation function to be used on the data, and a group predicate that specifies the machines on which the aggregation is performed. An agent running on each machine populates (attribute, value) pairs, such as (CPU-MHz, 3000). To spread queries and aggregate answers back to the source, a DHT tree is used. Moara attempts to prune this tree before spreading the query to nodes that satisfy the group predicate. Additional optimizations are also made, such as short-circuiting the trees to reduce the number of internal nodes not satisfying the predicate and rewriting predicates targeting multiple groups. Moara guarantees eventual completeness, that is, when the nodes satisfying the predicate and the DHT do not change after some period after the query, the group eventually returns answers from all the nodes. To bypass internal nodes that forward responses up or down a tree but to not satisfy the predicate, a separate query plane is introduced. This optimizes the tree by maintaining two sets at each node: a list of nodes that it forwards to its parent (updateSet) and a list of children where it forwards received queries (called the qset). However, this design only allows for one-shot queries rather than automatic and continuous monitoring like the medium-term overlay in MON. Continuous monitoring seems more applicable to some of the use cases mentioned in section 2, especially data centers that may have thousand of nodes that are continuously monitored for failures. Also, latency is not as low as MON, median response time for queries is 1- 2 seconds, and 90% complete in less than 5 seconds. From: anjalis2006 SpamElide on behalf of Anjali Sridhar [sridhar3 SpamElide] Sent: Tuesday, April 05, 2011 12:16 PM To: Gupta, Indranil Subject: 525 review 4/5 MON: On-demand overlays for distributed system management, J. Liang et al, SIGOPS OSR 2007. MON attempts to provide an on-demand overlay network for query distribution and software pushes over distributed applications running over hundreds of hosts. It aims to minimize memory, computation and bandwidth requirements for these functions. MON provides a system where system properties like RAM, CPU utilization etc can be queried by the user. Monitoring the running of an application on large scale distributed applications has been described as being the motivation. MON users attempt to query the status of the system as if the status of each node is a row in the database. MON is divided into three components – the Management Commands, the Gossip system and the On-demand overlay construction. The management commands provide metrics like avg, top k , grep and run. The gossip system enables each node to maintain a partial list of the other nodes in the network. Each gossip message is associated with an age counter in order to remove failed nodes and update the partial view. There are two types of overlay networks that are created – trees and DAG’s. The trees are constructed either randomly or by a two stage locality aware method. The trees are generally used for queries and the DAG’s for software pushes. The paper also shows the point at which a persistent overlay network might be more efficient than an on demand overlay network. Reliable on demand overlay networks are further discussed along the lines of either session reliability or task reliability. Session reliability talks about the number of nodes that are disconnected from the on demand overlay network at that time. Task reliability is about the number of nodes missing from the aggregate results. The advantage here is that the user can specify the threshold for acceptable session and task reliability. MON is able to execute queries within seconds by constructing the best overlay network for each query. It is able to give the user some control regarding the tradeoffs between query reliability and completion of the query by deciding the threshold for missing nodes. The status change of hosts in the middle of a query is not addressed clearly. For example, the closer the node is to the root; greater is the chance for disconnecting a larger part of the network. The paper talks about query results with respect to number of nodes that are currently connected in the overlay. The proposed suggestions at the end of the paper are some of the improvements that came up during the paper reading. One of the aspects about the paper is the lack of comparison graphs with existing tree aggregation methods. Moara: Flexible and Scalable Group-Based Querying System, S. Ko, Middleware 2008 This paper introduces a system to query large distributed infrastructures by building aggregation trees for different groups and maintaining them. It focuses on achieving flexibility in querying the groups, efficiency in the message overhead and scalability of the number of machines. It provides high availability and scalability of the system and eventual consistency on aggregation results. Information is stored at each node in the form of tuples (attribute, value). Each query consists of three parts – query attribute, aggregation function and the group that is being queried. The group field specifies the nodes that will be taking part in the query based on a condition. Moara uses a structured overlay routing for constructing DHT trees. In order to reduce bandwidth cost, Moara employs certain techniques that remove nodes that do not satisfy the group predicate. Each node maintains its state of being pruned or not based on if it satisfies the group predicate. Using certain binary variables, Moara proposes a state machine which allows the queries to obtain values only from nodes that satisfy the group predicate. An optimization to the tree construction is having a separate query path for some child nodes which have to traverse many internal nodes that do not contribute to the query. Moara also talks about how to deal with composite predicates by building multiple trees. The query is routed to the tree that incurs the least bandwidth cost. Moara does propose optimized tree construction and routing for each query. The experimental results comparing it with centralized aggregators show a definite reduction in latency. By employing existing routing algorithms, Moara is able to concentrate more on the optimization of the tree that is constructed based on each query requirement. The paper does not talk about if the user is able to see the number of nodes involved in the query. Does the query fail if some of the nodes that were initially in the tree do not respond? Since Chord or similar DHT routing algorithms are used, each node needs to know about every node in the system. The routing carried out may need to be location based when constructing the tree. Some of these aspects might add to the latency of query requests. From: kevin larson [kevinlarson1 SpamElide] Sent: Tuesday, April 05, 2011 12:07 PM To: Gupta, Indranil Subject: 525 review 04/05 Liang et. al present their Management Overlay Networks (MON) approach, which uses gossip-style on-demand overlays to issue commands such as queries and software pushes. On demand overlays are build per command, and in systems with infrequent queries, MON is very efficient. Where as persistent overlays require maintenance and membership handling, on demand overlays require no persistent state. Even including the setup times on clusters with thousands of nodes, on demand overlays complete their queries within a few seconds. MON is built around gossip based memberships, with each node maintaining a fixed size “partial view” of close by nodes. At a periodic interval, a node picks a random target and exchanges random membership entries. Age associations help remove failed nodes. To execute commands, two types of structures are used, DAGs and trees. Trees are useful for static queries, and DAGs are useful for software pushes. The idea of constructing on-demand overlays is very interesting. However, the ability to ignore most of the management overheads doesn’t come without a cost. It would be interesting to see how frequent queries need to become in order for non-on-demand overlays to have comparable overhead with MON. Ko et al present Moara, a querying system which builds aggregation trees for different groups and adaptively maintains these trees in order to optimize the total message cost. The authors demonstrated the usage patterns for Planetlab and shows the inefficiencies of the current monitoring system. Moira stores data in tuples, and uses three part queries with query-attributes, aggregation functions, and group-predicates. The attribute is the item of interest, and the aggregation function is the operation, such as enumeration, max, min, etc. The group predicate is the set of machines for the aggregation to be applied over, defaulting to the entire set or machines. Moira uses a DHT to maintain the nodes in the system, and depending on the churn, uses varying levels of group management to prune the branches. Moara utilizes interesting optimizations for composite queries, selecting a small “cover”, which is a set of groups which contain all nodes that satisfy the composite predicate. Evaluation of bandwidth shows that Moara performs very well in comparison to global DHT trees and an aggressively maintained tree as well, basically performing as well or better in all cases. Latency shows that although Moara is slower in achieving the first responses, the relevant latency is to the slowest responses, where Moara also outperforms a centralized approach. . From: Anupam Das [anupam009 SpamElide] Sent: Tuesday, April 05, 2011 11:53 AM To: Gupta, Indranil Subject: 525 review 04/05 i. MON: ON-demand overlays for distributed system management This paper talks about developing a management level overlay which allows application level queries to be executed very quickly in a large scale distributed system. MON (Management Overlay Network) uses a gossip based management protocol as it is lightweight and it imposes less overhead on network bandwidth. So, every node only maintains a partial view of its available neighbors. MON uses this membership list to create the actual overlay network. MON proposes two ways to create this overlay network namely- random trees or DAGs. The paper then analyzes the upper bound of query rate at which MON becomes more feasible than persistent overlay. By default MON builds one on-demand overlay per query, but the authors also describe a way to reuse the on-demand overlays to perform multiple queries. However, MON wants to achieve this without spending any explicit maintenance bandwidth. For this purpose MON introduces two types of reliability (session and task reliability) . Pros: 1. MON provides a faster way to perform instant querying/monitoring in large-scale distributed systems. 2. MON imposes very low overhead on network bandwidth. 3. Overlays can be reused provided they meet some application level constraints. Cons: 1. Comparison between on-demand and persistent overlay is missing. It would have been nice to see exactly how much faster MON really performs over persistent overlay. 2. MON supports only a fixed list of commands. Could it support more complex set of commands/queries? If so, would that impose any restriction on the durability of the overlay? ----------------------------------------------------------------------------------------------------------------------------------------------- ii. Moara: Flexible and Scalable Group-Based Querying System In large distributed systems users and administrators often desire to monitor the overall performance of a particular application or gather statistics about a particular performance metric. To do so one could query the whole system but that would lead to high bandwidth cost and high latency. The authors of this paper therefore, present a group based querying system called Moara where information is aggregated in a group fashion (here groups contain only those nodes that are relevant to the specified queries). Moara maintains aggregation trees (DHTs) for different groups. It supports a query language of the form- (query-attribute, aggregation function, group-predicate). Here query attribute is the feature (e.g. cpu utilization) that we are trying to gather, aggregate function refers to the type of aggregation we are aiming for (e.g. max) and this must be partially aggregatable, and finally group-predicate defines the groups of machines (e.g. machines running MYSQL) on which this aggregation is performed. Moara also supports composite queries that target multiple groups using unions and intersections. Moara uses a few optimizations techniques to reduce bandwidth usage and response time. These optimization techniques include adaptively pruning branches of the DHT tree, bypassing intermediate nodes that do not satisfy a given query and querying only those nodes that are able to answer quickly without affecting the result of the query. The authors finally performed experiments on FreePastry simulator, Emulab and Planetlab to show the effectiveness of their proposed system. Pros: 1. Reduces bandwidth overhead and latency by directing queries to the desired target group. 2. Allows a wide range of queries. Queries can be combined to create more complex composite queries. 3. It is scalable as it sends queries to only the target group (i.e., independent of the total number of nodes in the system and grows linearly only with group size). 4. Dynamic maintenance allows Moara to perform well even in the presence of high group churn. Cons: 1. Maintaining a DHT for each query could be computationally expensive as a single machine could belong to multiple query-groups. Could DAGs or random trees be used (as used in MON paper)? If so it could reduce the maintenance cost of tree. 2. Moara sacrifices consistency for availability, so some of the machines might be left out of the group. This could impact the aggregated output. 3. The paper does not discuss much about the state maintenance overhead. -----------------Anupam