From: Shivaram V [shivaram.smtp@gmail.com] on behalf of Shivaram Venkataraman [venkata4@illinois.edu] Sent: Thursday, April 08, 2010 12:43 PM To: Gupta, Indranil Subject: 525 review 04/08 Shivaram Venkataraman - 8 April 2010 D3S - Debugging Deployed Distributed Systems This paper presents the design and implementation of D3S, a debugging system that can be used to dynamically check for error in large scale distributed systems. Debugging large scale distributed systems is a hard problem as the sequence of events occurring at many different machines need to analyzed to reason about errors. D3S presents a novel solution where developers write predicates which can be monitored at runtime by 'verifiers' and notify the developer if any of the predicates fail. The major design components of D3S are: 1. Predicates: Predicates are serial programs written in C++. D3S allows developers to organize predicates as Directed Acyclic Graph (similar to Dryad) and each vertex gets activated when its preceding vertices are ready. There are two parts which describe a predicate file: the first is the list of vertices in the predicate, which represents the number of stages in the verification process, the second is the implementation of an abstract method, 'Execute' which details the code that needs to be run to verify the state over a global snapshot. 2. Inserting Predicates: D3S compiler generates shared libraries for 'state exposers' and the 'verifiers' from the predicate code. These modules can be attached to a running process dynamically and so the developer can change the predicate without affecting the job. The state exposer then re-writes the binary module to output tuples as required and this data is then transmitted to the verifiers. 3. Global snapshot: The verifiers receive tuples from different machines and construct a global snapshot of the system using logical, Lamport time stamps. The predicate is then verified on the global snapshot in the final verifier vertex. D3S can report the set of states which led to an error if a predicate evaluation fails. 4. Partitions and Stream processing: Since there can be multiple stages in the verification process, the tuples generated from the state exposers are hash partitioned (Similar to Mapreduce) and then sent to the verifiers. Also in some cases it may be more efficient to disseminate the change in state, rather than the entire state. D3S supports such requirements found in stream processing systems. Pros: - Developers can reuse type declarations from their code in the predicates. - Detailed discussion of the bugs found using D3S in wide array of production services. Cons: - The scalability of checking depends on the predicate. The developer needs to model the correct set of stages in the verifier to make sure that a single verifier node does not become a bottleneck. - It may be costly to construct a global snapshot for some cases like deadlock detection in a huge system. Interesting points: - Use techniques from Dryad and Mapreduce for important design decisions. From: Rini Kaushik [rinikaushik@yahoo.com] Sent: Thursday, April 08, 2010 12:24 PM To: indy@cs.uiuc.edu Subject: 525 review 04/08 Rini Kaushik CS525 D3S is an online, run-time checker that allows developers to specify predicates on distributed properties of a deployed system. When D3S finds a problem it produces the sequence of state changes that led to the problem, allowing developers to quickly find the root cause.Using D3S, a developer writes functions that check distributed predicates. A predicate is a distributed invariant that must hold for a component of the system or the system as a whole (e.g., “no two machines should hold the same lock exclusively”). D3S compiles the predicates and dynamically injects the compiled libraries to the running processes of the system and additional verifier processes that check the system. After injection, the processes of the system expose their states as tuples (e.g., the locks a process holds), and stream the tuples to the verifier processes for checking. Pros: ----- 1) Typical tracing based debugging solutions have limitations. One limitation is that because of the high performance and space overhead of tracing, tracing is typically kept turned off or kept at a minimal trace level during the production run. As a result, the traces may end up not having the right amount of information necessary for reproducing/diagnosing a bug. If the programmers can come up with predicates for the class of most prevalent bugs in their system, D3S may allow for an online, always-on lightweight bug diagnosis and detection. 2) Hind-sight is always 20-20. Programmers may not be aware of all the relevant tracing points upfront. Hence, a trace based debugging solution would require a software upgrade to add missing traces in the code base. D3S allows verifiers to be added dynamically during runtime. This is a major advantage as it doesn't require any upgrade or interruption of the production run and can be done online. 3) D3S allows scalable checking by distributed the checking to multiple machines. 4) D3S makes sure that the verification processes are fault-tolerant themselves. 5) D3S reduces false positives by keeping track of machine failures. 6) The authors deployed D3S on 5 real-world systems to show D3S' potential and applicability. Cons: ----- 1) The paper makes an argument that tracing either monitors too much information or too little information. However, tracing has the potential for diagnosing a variety of bugs such as configuration errors, memory corruption, buffer overruns etc. I am not sure that predicates can work for all kinds of bugs. The invariants may not be sufficient to capture all kinds of bugs that may happen in the system. For example, configuration bugs will still rely so form of tracing for diagnosis. 2) I have worked with Pin which is a dynamic instrumentation tool. Pin had substantial performance overhead even in the baseline case with no instrumentation. Since, D3S uses dynamic instrumentation, I am concerned of its associated overhead and its viability in a production environment. The simple example shown in the paper will translate into multiple dynamic instrumentations in all parts of the code where lock acquire and release are invoked. Even though the evaluation hints at a low 8% overhead, I am not sure that the evaluation has covered the high overhead situations. For example, if a program has a large amount of synchronization code, the overhead of dynamic instrumentation of locks would be much higher than in a program with less amount of synchronization code instances. 3) Concurrency bugs need more state ordering information for problem diagnosis given their non-determinism in an multi-core environment. The predicate checker may not be able to accurately identify the root cause of the problem in a multi-core processor. Also, sometimes the root cause happens much earlier than the when the actual symptoms first appear. A perfect example is a memory corruption bug. Such a bug would require a larger snapshot window than feasible in runtime. 5) There is an implicit assumption in the example shown in the paper that all the shared variables are protected by locks. However, a study had shown in the past that majority of concurrency bugs happen because programmers miss protecting all instances of shared memory. Thus, D3S won't be able to instrument accurately and miss detecting such concurrency bugs. From: Vivek [vivek112@gmail.com] Sent: Thursday, April 08, 2010 12:23 PM To: indy@cs.uiuc.edu Subject: 525 review 04/08 WiDS Checker: Combating Bugs in Distributed Systems   This paper describes some issues with debugging of "distributed systems" .  By distributed systems, the authors seem to mean the protocols that support these distributed  systems.  They  bring to light the standard "printf"  based debugging that  all too often becomes overly complex. In this paper, they discuss their WiDS checker framework and show how predicate checking is used  in a simulator for determistic replay.  They apply their framework to two real-world systems, and one system built by themselves(BitVault, a content-addressable storage system). Pros:    -  The paper discusses bugs that  occur even  at very large-scale. This will be important for today's distributed systems.    -  The paper provides some very  good general implementation lessons learned  in section 4.6.  Their  experiences with predicate-based checking,  they show the power  of it on large-scale distributed systems  through a thorough analysis of their case studies(Chord,  Paxos).    - The paper makes a clear argument of their tool, showing how it had found "non-trivial bugs" in Chord, Paxos,  Boxwood.  For boxwood, they discovered bugs related to livelock, deadlock.  These are surely important bugs, and the systems are also clearly very  highly used. Addressing these bugs in such well-known systems clearly shows the strength of WiDS. Cons:      -  Deterministic replay is a  very  important and relevant piece of related work.  However,  what about Probabilistic replay?  Is this not relevant in a distributed system as well?     -   While they do mention that there are no well-known benchmarks to "measure"  the effectiveness of a debugger,  it seems that their analysis of their systems could have addressed scalability more. Furthermore, their experiments didn't seem to vary parameters and try different values (less experimental,  more use case scenario).  While it may seem difficult to do,  it seems that a systematic method of evaluation for all the different applications would have helped.  -  In the study of effectiveness,  what  standards must  a debugger meet to be effective?  The  qualitative  description and comparison to other system(end of section 4. 2)  helps, but a clearer definition of effectiveness could be useful. Indeed, there is no standard agreed upon method for evaluating effectiveness,  but that shouldn't stop them from clearly defining their own. From: Giang Nguyen [nguyen59@illinois.edu] Sent: Thursday, April 08, 2010 12:23 PM To: Gupta, Indranil Subject: 525 review 04/08 CS525 review nguyen59 04/08/10 WiDS Checker: Combating Bugs in Distributed Systems Distributed systems are hard to debug. Some bugs only manifest at very large scales. Due to heisenbugs, adding a print statement to debug might change the timing enough such that the bug doesn't manifest. This paper proposes a checker for distributed systems called WiDS, which is replay-based predicate checking. The programmer uses the WiDS toolkit, which provides a set of APIs. The program can be run in real deployment or in simulation without code changes, thanks to the WiDS API toolkit. WiDS models the system at "event" granularity. Events can be the expiration of a timer, or receiving a message from another node, or scheduling and synchronization events. WiDS interprets the execution of the entire system as a sequence of events. As the program executes, WiDS logs all non-deterministic events. When replaying, WiDS is fed the event log and uses Lamport's logical clock to decide the replay order of events from different nodes so that the "happens-before" relationship is pre! se! rved. WiDS provides a scripting (python) environment that the programmer can define predicates that WiDS checks during the replay. The predicates are checked at event boundaries. WiDS also includes a visualization tool to generate a message flow graph based on event traces. This helps programmer perform root cause analyses. Pros: - Able to detect valid new bugs in distributed systems. Cons: - Replay-based means having to rely on completeness of the logs. - Have to modify application code to use the WiDS toolkit. - Logging of random number generation is a potential security/privacy violation. ----------------------------------------------------------- X-Trace: A Pervasive Network Tracing Framework Networks are complex, consisting of different layers of software, different network devices, services, protocols etc. There are diagnostic tools, but they are targeted at one particular protocol. In order to better diagnose network issues, the paper proposes an integrated tracing framework called X-Trace. Basically for each X-Trace enabled network request, it involves picking a unique identifier, and including that identifier in the network packets, at various layers of protocols. This metadata is propagated down to lower layer ("pushDown") and also along requests caused by this request ("pushNext"). Consider an HTTP proxy environment, HTTP client inserts the X-Trace metadata into HTTP headers and also pushes down the metadata to the TCP layer, which will include it in the TCP headers and pushes down to IP layers, etc. The HTTP proxy will create its own connections to the server to serve the client's request, so the proxy uses pushNext to copy the (modified accordingly) X-Tr! ac! e HTTP headers in the request to the outgoing request that it will create. The TCP layer at the proxy does similar things for the TCP headers. These traces will be logged by the devices. These logs provide a complete view of the requests. Pros: - Different layers/devices in the network can initiate X-Trace as desired. Doesn't require full cooperation from everyone to get benefits. Cons: - Requires modification of the protocols. - The paper includes unnecessary implementation details (the threads in section 3.2) and on the usage scenarios. Probably to fill up the space. From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Thursday, April 08, 2010 12:19 PM To: Gupta, Indranil Subject: 525 review 04/08 Paper 1: D3S: Debugging Deployed Distributed Systems The authors are motivated by the fact that many of system bugs manifest themselves after development phase when the system is deployed and being utilized. This is more essential in distributed systems that some faults may get revealed after a sequence of interactions and even failures in different components of the system. D3S is a checker that provides developers the ability of specifying and embedding some predicates in the distributed components in order to have the bugs and the sequence of events resulting those being revealed and checked. D3S compiles the predicates and dynamically injects the compiled libraries to the running processes of the system and additional verifier processes that check the system. Thereafter the processes of the system expose their state via tuples which are streamed (through a directed acyclic graph) towards the verifier processes for checking. When a problem is identified, reports containing the problem and the sequence of the events causing the problem are sent to developers. The developers can change the predicates at runtime and realize the bugs. D3S has been applied to different deployed systems and the experimental results are presented. The capability of reporting, checking, and changing the predicates at runtime and recording the exact sequence of events producing the problems while imposing low overhead on the system (on average 1%) is valuable. The system, without creating unnecessary false positives or false negatives, continues working even when a few checkers fail. The load balancing improves availability of the checking system. By using binary instrumentation, D3S can work with legacy systems. While simple, the design decisions make the D3S expressive and lightweight. However, it is not very efficient for checking liveness properties. In addition, its worst-case overhead is significant (about 8%) which would not let the D3S scale very well. Paper 2: X-Trace: A Pervasive Network Tracing Framework The paper introduces X-Trace, a tracing framework which provides a comprehensive view of service behavior for Internet systems. The main characteristic of X-Trace is that it is not limited to a particular protocol and it can be adopted in different stack layers and protocols. The way it works follows: X-Trace is invoked by a user when he initiates an application task. The user inserts X-Trace metadata as well which is propagated through the system. The metadata will be propagated in breadth and depth both, that is, goes down through different layers of stack protocol and goes next to next peer layers. X-Trace tags all initiated subtasks and network operations resulted from the original task submitted to the system with a unique task identifier. The constructed tree from network operations, called task tree, captures the causal paths in the network protocol. Then the trace information will be delivered to (interested) users. The trace requests are sent in-bound and the trace reports are sent out-of-bound. The authors have presented the positive and negative sides of their approach very well. Thats a good feature that X-Trace delivers every piece of trace information to the associated party and does not give the whole information to all entities, (probably) destroying privacy of different administrative domains. This decoupling of the entity that requests tracing from the one receiving trace reports makes it more interesting. The point that X-Trace needs some change in the network devices decreases the applicability of it. Its more appropriate to have an approach that does not need any change in the existing devices, for example one in which the behavior of the network in different levels is monitored in a passive manner and the required information is extracted and analyzed. The probability of false positives might not be low, since the reports might get lost and this is not a rare phenomenon. Though the out-of-bound delivery of trace information results in a solution of good reliability in the case of network failure, it would not be the only choice. For example it might be better to partially store the trace data in the network to mitigate the effect of failure, rather than using out-of-bound communication channels. Presenting more results of their implementation could be helpful. From: Gupta, Indranil [indy@illinois.edu] Sent: Thursday, April 08, 2010 12:14 PM To: indy@cs.uiuc.edu Subject: FW: 525 review 04/06 Indranil Gupta Associate Professor Department of Computer Science University of Illinois, Urbana-Champaign http://www.cs.illinois.edu/homes/indy/ -----Original Message----- From: Jayanta Mukherjee [mailto:mukherj4@illinois.edu] Sent: Tuesday, April 06, 2010 10:22 AM To: indy@ad.illinois.edu Subject: 525 review 04/06 In Byzantium Jayanta Mukherjee NetID: mukherj4 The Byzantine Generals Problem LESLIE LAMPORT, ROBERT SHOSTAK, and MARSHALL PEASE SRI International In this paper the authors first described what a The Byzantine Generals Problem is. The problem is expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. Communicating by some messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. The authors depicted that, using only oral messages, this problem is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. The same analogy can used to detect a failed component which is doing some malfunction, but, it is often overlooked--namely, sending conflicting information to different parts of the system. The authors made the following assumptions in order to define Oral messages: 1. Every message that is sent is delivered correctly. 2. The receiver of a message knows who sent it. 3. The absence of a message can be detected. Pros: 1. This paper is well-written and the problem is like a Game-Theory problem being expressed in the form of a networking problem to detect a faulty component. 2. The algorithm will work when two-third of the system is reliable, which is true in many scenarios. 3. The approach to identify the traitor or reliable entity works well for a fixed-set of rules or protocols which are defined precisely. 4. The implementation (or analogy) of oral messages in the context of fault-detection is quite impressive. 5. The 2/3 rd population is loyal means they will come up to same (or similar) conclusion based on the protocol, which helps to make the decision. Cons: 1. This approach works well for fixed-set of predefined protocols, but, whether it worked for any other decision making is not very clear. 2. The reliability of the algorithms depends on how well the oral messaging is being implemented. It will not perform well if the oral messaging is not reliable. Comments: It is a nice work, very well-written and too precise about defining and proving the concepts. This approach can be exploited for the sensor networks for spying applications or even to remove noise by detecting and replacing faulty components. UpRight Cluster Services: by Allen Clement et al, The authors made Byzantine fault tolerance (BFT) which someone can easily adopt both to safe-guard availability (keeping systems up) and to safeguard correctness (keeping systems right.) They built UpRight, a new library for fault tolerant replication, and used it to build BFT versions of two widely-deployed open-source crash fault tolerant (CFT) systems, the Zookeeper coordination service and the Hadoop Distributed File system (HDFS). They used UpRight library at it relies on Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault-tolerance for a range of cluster services. Liveness of the system is guaranteed only during synchronous intervals in which messages sent between correct nodes. The UpRight is designed to provide the following properties * An UpRight system is safe despite commission failures and any number of omission failures. * An UpRight system is safe and eventually up during sufficiently long synchronous intervals Pros: 1. The system's safety properties hold in any asynchronous distributed system where nodes are connected by a network that may fail to deliver, corrupt, delay, or reorder messages. 2. The system is being confi gured with separate fault tolerance thresholds for omission and commission failures, which is bene cial for three reasons. 1. Omission failures are likely to be more common than commission failures. 2. It allows them to build systems that match typical commercial deployment goals. 3. Tolerating a small number of commission failures may capture the sweet spot for Byzantine fault tolerance in cluster environment 3. The Upright approach is stochastic 4. It is multithreaded with proper OS support, so, its performance will be good. 5.The hybrid check-point/delta approach seeks to minimize intrusiveness to legacy code. 6. The helper process approach produces checkpoints asynchronously to avoid pausing execution of new requests and seeks to minimize intrusiveness to legacy code. 7. UpRight provides the added option of activating Byzantine fault tolerance at some future date on top of CFT. Cons: 1. The approach taken by Upright is not as reliable as Byzantine fault tolerance in many scenarios. 2. Serial-Read for ZooKeeper is costly. Although it has multithreaded support but I/O is always a bottleneck, which Upright does not seem to improve. 3. From the results provided in the paper it is clear that this system has lower latency, but what about the reliability. How well it can recover and detect faults is not very clear. Comment: Not much data has been provided to support how good the proposed fault-tolerance system is. Results and studies should have been more rigorous. From: arod99@gmail.com on behalf of Wucherl Yoo [wyoo5@illinois.edu] Sent: Thursday, April 08, 2010 12:06 PM To: Gupta, Indranil Subject: 525 Review 4/8 Distributed Debugging, Wucherl Yoo (wyoo5) D3S: Debugging Deployed Distributed Systems Xuezheng Liu et al, OSDI 2008 (Microsoft Research) Summary: This paper presents a framework for scalable distributed debugging with small manual annotation efforts. With written predicates from programmer, the State Exposer (SE) are generated and dynamically injected to the running processes. The SEs produce tuples about the current state of interest. The verifying processes with the generated Checking Logic (CL) from the predicate order the tuples globally and evaluate the predicate on gathered consistent snapshots of tuples. At runtime of the distributed predicate execution, some invariants about the exposed states can be checked and violations against the invariants are reported. Three main challenges that the authors point out are the expressiveness for the developers to check, handling failures of check machines, and handling failures of process being checked. For first challenge, the developers can write sequential C++ code that can check distributed properties in a centralized manner. For second challenge, when one verifier fails, start new one since the re-execution with same input is possible due to the determinism of the checker. For third challenge, the failed process can be removed from the consistent snapshots with the help from the failure detector. Pros: 1. Good intuitions and observations for reducing checking overhead for scalability – only check exposing function parameters is sufficient to monitor state changes in most cases and most useful properties can be checked with only a rent time window of snapshots. 2. Good design decision for scalability – tuples that describe the exposed states are partitioned and multiple verifier processes can run the checkers in parallel. Reducing the transmission of snapshots by sending only the difference between last one and new one. 3. Practical integration of failure detector with consistent snapshot algorithm (originally based on the assumption of no failures) with the assumption about the message delay of their machine-room environment 4. Finding non-trivial bugs of the distributed systems with a semi-automatic framework from sequentially expressed predicates Cons: 1. Effort to write predicate – may find nothing and the effort can be wasted. 2. False positives and false negatives when snapshots are incomplete. 3. Injected code can deviate the execution by changing the timing of execution or communication delay 2. for sending exposed states to generate snapshots 4. For more coverage to bug searching space, more fuzzing techniques may be necessary such as intentionally changing the execution order between the machines (if they are not depedent) or intentionally select the execution path with separate input for branch operation. -Wucherl From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Thursday, April 08, 2010 10:45 AM To: Gupta, Indranil Subject: 525 review 04/08 WiDS Checker : Combating Bugs in Distributed Systems -------------------------------------------------------------------------- Summary ------------ In this paper, the authors present WiDS Checker, a framework to facilitate debugging in distributed systems. The authors identify that distributed systems are non-deterministic across runs making it impossible to follow the same code path to reach a bug. Besides, distributed properties reside on multiple machines. They therefore propose a replay based predicate checking process. The predicates are user defined. When the system runs, the runtime logs all non-deterministic events. These can then be replayed by a simulator. In order to know the ordering of events across nodes, Lamports logical clock is used. Checkpoints, consisting of a WiDS process snapshot as well as user level process context, is used to reduce the storage overhead of logging by enabling partial replay during replay. During the simulation, a checker evaluates at all events, user defined predicates that model system properties and report violations if any. Since WiDS is used for C++, it logs the memory allocation and de-allocation of each object in order to have a true picture of the application memory state. Besides, the authors also provide for a message flow graph based on message trace to facilitate tracing back in time from a violation. The authors also show how WiDS could detect subtle bugs in well-known distributed systems and protocols such as Paxos and BitVault Storage System. Pros ------ 1. They consider event granularity, which should be enough to model a distributed system. All actions in a local node can be debugged at that node itself. 2. The system uses predicate based checking which is simple and user defined. Cons ------- 1. The tool is targeted at the scenario in which the system is debugged by those who developed it, and thus assumes that the bugs are hunted by those who are intimately familiar with the system. 2. We know there are some disadvantages of Lamport's clocks. For example, one cannot know the ordering of two events that are not related by the happens-before replationship. Such issues are not addressed by the authors. 3. Evaluation shows that runtime increases significantly. There might be operations where such increase in runtime may not be acceptable. 4. The tool is limited to applications written using the WiDS API or Macedon. Thanks, Kurchi Subhra Hazra Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Thursday, April 08, 2010 9:46 AM To: Gupta, Indranil Subject: 525 review 04/08 WiDS Checker: Combating Bugs in Distributed Systems, X. Liu et al, NSDI 07 This paper presents the WiDS checker, a deterministic log-and-replay based debugging mechanism or distributed applications written using the WiDS API. Lamport timestamps are stored along with each event in the log to enable a consistent replay. The user is required to specify properties that the application is required to satisfy, and the system detects violations of these properties and stores checkpoints to enable the user to replay the events leading to this violation. Comments: 1. The WiDS checker only tries to identify what /does/ go wrong -- not what /can/ go wrong. A simple extension would be to check what happens when the ordering of causally unrelated events is modified, instead of doing a deterministic replay (causally independent events can still be semantically related -- such as when they try to access the same variables). 2. How does the system handle hardware failures? That is, what happens if the program is otherwise correct but a particular execution behaves erroneously because of, for example, memory corruption at one of the nodes? Would a false positive be reported in this case? 3. A practical limitation is that the tool can only be used to debug programs written using the WiDS API. Which means, among other things, that it could not be used with low-level system libraries. From: Jayanta Mukherjee [mukherj4@illinois.edu] Sent: Thursday, April 08, 2010 9:12 AM To: Gupta, Indranil Subject: 525 review 04/08 Distributed Debugging Jayanta Mukherjee NetID: mukherj4 X-Trace: A Pervasive Network Tracing Framework, R. Fonseca et al, The authors proposed X-Trace, a cross-layer, cross-application tracing framework designed to reconstruct the user’s task tree. This framework gives a comprehensive view of the service behaviour. X-Trace has been implemented in several protocols and software systems in three basic scenarios: 1. DNS Resolution 2. A three-tired photo-hosting website 3. Service accessed through overlay network. X-Trace provides an integrated tracing framework where, a user can invoke X-Trace when initiating an application task (e.g., a web request), by inserting X-Trace metadata with a task identifier in the resulting request. X-Trace provides tracing for multiple layer of the networking protocols. The trace data generated by X-Trace is published to a reporting infrastructure, ensuring that different parties can access it in a way that respects the visibility requirements of network and service operators. Pros: 1. X-Trace is comprehensive; It tags all network operations resulting from a particular task with the same task identifier. 2. This framework enables X-Trace enabled nodes to encode causal connections necessary for rebuilding the user's task tree. 3. X-Trace-enabled devices log the relevant information connected with each tagged network operation, which can then be reported back. The trace information associated with a task tree gives the user or operator a comprehensive view of what network operations were executed as part of a task. 4. The authors being honest about the limitations of X-Trace, this paper can motivate developers to work on the systems in the right direction to resolve the issue. 5. The trace is not a probe message, but, being sent in-hand. Meta-data is added in the same datapath that someone wants to trace. 6. Reconstruction of the tree is quite simple, and serve as foundation for more complex visualizations. Cons: 1. X-Trace requires that network protocols be modified to propagate the X-Trace metadata into all actions causally related to the original task. 2. To implement X-Trace, modifications to clients, servers, and network devices is needed; 3. When X-Trace is only partially deployed, the ability to trace those parts of the network is impaired, sometimes entirely. 4. The lost trace reports can limit reconstruction of the request tree and can lead to false positives in diagnosing faults (i.e., the lack of trace data may be interpreted as a failure). 5. X-Trace can only figure out the effect, it can not detect the reason for the failure. So, other tools are necessary even with X-Trace to find out the root cause of that failure. 6. The tree structure is enforced on the set of network operations related to a particular task means that there are some request topologies that we cannot capture. 7. X-trace is multilayer tracing service, so incur significant overhead on the system and it may violate some security issues. Comments: The best thing I found about this paper is they clearly mention the limitations of X-Trace which will help other developers to fix those issues. D3S: Debugging Deployed Distributed Systems by Xuezheng Liu et al, The authors proposed D3S, a tool for debugging deployed distributed systems, which automates many aspects of the manual approach, allows runtime checking to scale to large systems, and makes the checking fault tolerant.D3S is a checker that allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running. The framework uses chord. When D3S finds a problem it produces the sequence of state changes that led to the problem, allowing developers to quickly find the root cause. Developers write predicates in a simple and sequential programming style. Pros: 1. D3S checks these predicates in a distributed and parallel manner to allow checking to be scalable to large systems and fault tolerant. 2. D3S works transparently with legacy systems by using binary instrumentation and can change predicates to bechecked at runtime. 3. D3S can detect non-trivial correctness and performance bugs at runtime and with low performance overhead. 4. Using D3S, a developer writes functions that check distributed predicates. 5. D3S monitors verifier processes. When one fails, D3S starts a new verifier process and feeds it the input of the failed process. 6. D3S uses a logical clock to order the exposed state tuples, and has a well-defined notion of which processes are in the snapshot at each time-stamp. Cons: 1. D3S is not fully automated, the user has to write code to record the state of each machine and order these states into a globally-consistent snapshot. 2. Developer has to distribute the checking across several machines, so, needs more expert programmers. In D3S, the developers have to organize checkers in a directed-acyclic graph. Comments: The paper is missing some technical details about how the system has been implemented. That may be intentional. -With regards, Jayanta Mukherjee Department of Computer Science University of Illinois, Urbana-Champaign Urbana-61801, IL, USA Mobile:+1-217-778-6650 From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Thursday, April 08, 2010 8:39 AM To: Gupta, Indranil Subject: 525 review 04/08 525 review 04/08 Hyun Duk Kim (hkim277) * D3S: Debugging Deployed Distributed Systems. Xuezheng Liu et al, OSDI 2008 (Microsoft Research) This paper proposed Debugging Deployed Distributed System (D3S), which is for debugging of distributed system. D3S allows developers to make predicated to check. Implemented predicates can be dynamically injected to execution process for verification at run time. Predicates are implemented on the directed-acyclic graph which represents a computation stage. For handling failures, D3S uses global snapshots. For evaluation, authors implemented 5 systems using D3S, and they showed D3S could find intricate bugs. They also showed that the overhead of D3S was pretty small. Debugging in distributed system is very painful. This is a very good tool solving debugging problem in distributed system. Authors actually deployed 5 systems using D3S. Practical implementations are the best way to show the proposed methods can be applied. Implemented system even showed a good performance (find complex bugs) with small overhead (less than 8 percent). D3S can detect bugs after real execution. If we have to execute and wait for finishing program, it may be time-consuming. D3S does not support simulation kind of methods to detect bugs. Users may test with small data sets, and fix bugs, then, use entire data for real execution. This paper does now show how outputs look like. While authors explains how D3S works, how it can be applied to systems and the fact that it can find bugs, but do not show how debug outputs look like. Does the system just tell users whether there is a bug violating predicates or not? The purpose of finding bugs is to fix them. Notification is not enough, and we need more information to fix them. If the system can provide useful information for debugging (for example, where it happened, what was the data triggering errors, was it machine/network/data failure, or etc). * WiDS Checker: Combating Bugs in Distributed Systems, X. Liu et al, NSDI 07 This paper present WiDS Check, a unified framework that checks bugs in distributed systems. Most of existing distributed debuggings depends on printf based log mining. This log mining is expensive and difficult to find complex bugs. WiDS let developers check predicates which are declared by script languages. WiDS uses a simulator to rerun executable binary and check the violations of defined predicated. Non-deterministic events are found from runtime logs which are gathered in system run and ordered by Lamport clock. According to experiments, WiDS checker could find complex bugs in real system. Simulation based checking has small overhead. WiDS analyze logs in order and finds bugs in replay. Because it does not involve real execution, the overhead is not big. WiDS checker provides good supporting tools to analyze bugs. Finding where bugs come from is more important than just finding whether there are bugs or not. WiDS checker provides visualization tools which shows message flow graph. Also, because all facilities are integrated into the Visual Studio Integrated Development, developers can 'time-travel' to violation points. Although authors criticize existing log mining method, basically, WiDS also uses log mining for non-deterministic run ordering. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Thursday, April 08, 2010 1:49 AM To: Gupta, Indranil Subject: 525 review 04/08 CS525 reviewed on Distributed Debug Liangliang Cao (cao4@illinois.edu) April 8, 2010 Paper 1: X. Liu et al, WiDS Checker: Combating Bugs in Distributed Systems, NSDI 07. This paper develops WiDS Checker which can check distributed systems through both simulation and reproduced runs from real deployment. Based on the replay APIs in WiDS toolkit, WiDS Checker is a replay-based predicate checking approach that allows the execution of the entire system to be replayed afterwards within a single machine, and at the same time checks node states during the replayed execution against user-defined predicates. Pros: • The idea of combining simulator and online checking is novel and attractive. • The extensive applications on Paxos, Boxwood, BitVault, and Macedon-Chord are impressive and convincing. • The on-the-fly scripting makes the debugging process much easier in practice. Cons • The visualization tool in section 3.4 seems very useful, however, it might not scale well for large system. The author might generalize this idea for both individual nodes and groups of nodes. • The replay based method might be too expensive for large scale debugging. • The WiDS check relies heavily on the replay function in WiDS, which might prevent its popularity for other large scale distributed system frameworks. Paper 2: X. Liu et al, D3S: Debugging Deployed Distributed Systems, OSDI 2008 This paper developed D3S, which is another debugging project based on WiDS. D3S allows developers to specify predicates on distributed properties of a deployed system, and that checks these predicates while the system is running. Pros: • The debug overhead on D3S is as low as 8%. This is quite impressive in terms of efficiency, especially when checking large scale systems. • The D3S provides very flexible debugging interface: compilers splits the module into two parts: state exposer and a checking logic module. In the run time, the predicate computation can be partitioned into multiple machines. • By applying D3S to web searching task (in Sec 5.3), D3S has good potentials and I wish the author would find very cool applications by debugging searching engines. Cons: • It seems that D3S is limited to Windows. Although it is understandable since the authors worked in Microsoft, it will be better if the system can also be provided in other platforms. • Although D3S is efficient for component-level predicates, it is still not efficient enough for large scale debugging. For web scale applications where the data center is involved, it might be interesting to combine D3S with other powerful log analyzer techniques, for example, some OLAP log mining schemes. From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Wednesday, April 07, 2010 10:58 PM To: Gupta, Indranil Subject: 525 review 04/08 ===================================================================== Paper 1: D3S: Debugging Deployed Distributed Systems ====================================================================== Distributed systems are becoming very complex and large-scale day by day; hence testing and debugging of such system is a daunting task. In practice, the developer follows printf-based debugging approach where the developer needs to insert the printf message at proper places inside the code, needs to write a script to parse those messages and finally tries to find the error, if there is any, manually. It is very hard to anticipate what to log; too many log message becomes overhead for the system and too little log message might miss some important faults. Although, several approaches such as multi-threaded debugging, model checking, runtime-checking, etc. have been exploited to date to address this issue, unfortunately, those approaches are still suffering from the problems mentioned earlier. In the paper titled "D3S: Debugging Deployed Distributed Systems", the authors proposed D3S, a tool for debugging the distributed systems, to remove those problems. D3S removes the need for manual debugging, allows run time checking for large distributed system, and makes the checking fault tolerant. The developer, while using D3S, needs to write predicates on distributed properties. D3S compiles those predicates and inserts those to the running process of the system and verifier detects violation if there is any. D3S design has three main challenges: (1) how to allow developers to express easily the properties they should check? D3S permits developers to organize checker in a DAG to address this challenge. (2) What will happen if checking machines fail? To address this challenge, D3S observes verifier process. As soon as one verifies process fails, D3S establishes a new verifier process. (3) D3S should handle failures of processes being checked. For this challenge, the verifier processes remove failed processes from globally consistent snapshot before checking the snapshots. The authors evaluated D3S with five real-world applications. Through their evaluation, the authors showed that, D3S is capable to identify non-trivial correctness and performance bugs at runtime and with low performance overhead. Pros: 1. D3S is automatic and it removes the manual effort of the developers to debug a distributed system. 2. The authors showed that D3S is applicable to existing large scale distributed systems. 3. D3S provides a simple language for writing distributed predicates. 4. Developers can modify what is being checked at run time. 5. The checking overhead is small. Cons/Discussion Points: 1. I am wondering whether any racing condition can be introduced through the injected codes. 2. Do predicate can detect all the bugs? How about memory leakage? 3. How to combine online predicate checking with off-line replay? 4. Can D3S find the bugs introduced due to the interaction between the systems? ============================================================= Paper 2: WiDS Checker: Combating Bugs in Distributed Systems ============================================================= While I started reading the paper titled "WiDS Checker: Combating Bugs in Distributed Systems", I found that, the main author is the same of previous D3S paper. Both of these papers are trying to solve the same problem (debug a distributed system). However, D3S is published one year later than WiDS paper in the same conference (NSDI). In this paper, the authors presented WiDS checker, a checker that finds bugs in a distributed system. WiDS checker fulfills the following requirements which a distributed debugging tool must have: (1) WiDS checker is able to verify application properties, especially distributed ones in an efficient manner (2) WiDS checker provides complete information about an execution so that developers, rather than monitoring pre-defined logs, can observer arbitrary application states (3) WiDS checker is able to regenerate buggy runs in a deterministic manner to enable the cyclic debugging process. WiDS checker logs the actual execution of a distributed system. It then applies predicate checking in a centralized simulator driven by testing scripts or replayed by logs and then finally generates output violation report along with message traces. WiDS checker consists of two major components namely "a scripting language" and "a checker". The first component allows a developer to refine system properties into straightforward assertions, while the second component inspects for violations. The authors evaluated WiDS checker by applying it to some complex real time systems (e.g. Paxos, Lock Server, BitVault, Macedon-chord) and they found some non-trivial bugs from those systems. Pros: 1. WiDS checker permits replaying an execution of a deployed distributed protocol. 2. Besides providing violation report, WiDS checker provides visualization for tracing out the root causes of errors. 3. The evaluation with real systems shows the usefulness of WiDS checker. Cons and Discussion Points: 1. WiDS checker is not applicable directly to the legacy systems. The developer needs to use WiDS API, which was developed by the authors earlier. 2. Can a user modify the state in the middle of replay? 3. According to table 3, the checker, in worst case (Paxos), increase running time by 20x. ======================================================= Paper 3: X-Trace: A Pervasive Network Tracing Framework ======================================================= Internet is simply a huge and extremely complex distributed system. Hence, it is very challenging to diagnose the source of a problem in Internet. The existing diagnostic tools only focus on one specific protocol and unable to reconstruct the comprehensive view of service behavior. For instance, traceroute can locate IP connectivity problems but can't determine the proxy or DNS failures. In the paper titled "X-Trace: A Pervasive Network Tracing Framework", the authors proposed the design and implementation details of a tracing framework named X-Trace that provides a comprehensive view for systems. A user calls X-Trace when initiating an application task. To do that, the user inserts X-Trace metadata with a task ID in the request. This metadata is then propagated down to lower layers through protocol interfaces. X-Trace tags all network operations resulting from a particular task with the same task ID. Task tree is the set of network operations connected with an initial task. Task tree could be reconstructed after collecting trace data with reports. X-Trace consists of three components: X-Trace metadata, Network path (Task Tree), Task Tree reconstruction. The authors discussed several usage scenarios to show the usefulness of X-Trace to identify faults: a web request and an accompanying recursive DNS queries, a web hosting site, an I3 overlay network. The authors also mentioned some other scenarios where X-Trace could be used: VPNs and IPv6 tunneling, ISP connectivity troubleshooting, link layer tracing and development of distributed applications. Pros: 1. X-Trace provides a comprehensive view of the systems. 2. X-Trace is not bounded to specific layers and applications. It works for cross-layer and cross-applications. 3. X-Trace makes little overhead to the network. Cons/Discussion Points: 1. Application and protocols need to be modified to use X-Trace. 2. While report loss occurs, reconstruction of entire task tree is not possible. 3. X-Trace is only partially deployable. 4. Some request topologies can't be captured. 5. What will happen if X-Trace is deployed in a distributed file system? 6. Perhaps, report generators and report recipients are vulnerable to DoS attack. From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Wednesday, April 07, 2010 8:31 PM To: Gupta, Indranil Subject: 525 review 04/08 D3S: Debugging Deployed Distributed Systems This paper presents the design, implementation and evaluation of D3S: An online framework for Debugging Deployed Distributed Systems. As distributed systems become larger and more popular, programmers find it even harder to keep the code bug-free. Furthermore, the classic debugging paradigm using ‘printf’ fails to keep up because many bugs are dependent upon a certain sequence of events and on a certain interaction between operations. Race conditions are an example. Moreover, printf-based debugging assumes that the same sequence of events will happen over and over again and this might not be untrue in non-deterministic systems. Therefore as bugs become more and intractable because of parallelization, increasing size and complexity of systems, printf-based bug-hunting is no longer practical. Also, the traditional approach of taking state snapshots and using a central machine to check for incorrect behavior scales poorly. Such an approach is cumbersome for the programmer. This paper proposes D3S allows programmers to write predicates which are then compiled and injected into already running code where the verifier processes then keep track of state tuples and when something problematic is detected, the verifier reports the whole sequence of events which led to the problem. They also incorporate the parallelization aspect by allowing the verifiers to be distributed across different machines to provide load balancing. One of the biggest strengths of the design of D3S is its ability to work transparently with already deployed existing systems and applications. Also, it makes life comparatively much easier than before but the developers still have to carefully think about what kind of predicates are needed. Their results from five existing distributed systems show that D3S can indeed perform well as well as scale well because of an overhead as low as 8%. However, D3S is understandably not able to detect all kinds of bugs e.g. memory leaks etc. It could therefore not completely replace other rigorous testing and verification techniques but can be used to augment them and work with them. And then there is the concern that reaching the right predicates would still require serious work on part of the developer. Also, since D3S returns the sequence of events that led to a problem, analyzing them and locating the problem still remains to be done by the developer. From: Ghazale Hosseinabadi [gh.hosseinabadi@gmail.com] Sent: Wednesday, April 07, 2010 7:41 PM To: Gupta, Indranil Subject: 525 review 04/08 Paper 1 X-Trace: A Pervasive Network Tracing Framework In this paper, a new tracing framework called X-Trace is presented. X-Trace is mainly designed to be able to perform tracing in different settings where tunnels, VPNs, NATs and overlays, multiple administrative domains, … are present. X-Trace is a cross-layer, cross-application tracing framework in which a user's task tree is constructed for diagnosing faults in distributed and complex Internet applications. In X-Trace, the trace request is sent in-band in order to add metadata to the data path that its trace should be collected. Moreover, the collected trace is decoupled from the data path by sending it out-of-band. In this way, X-Trace is capable of tolerating network failures. The requester is decoupled from the receiver of the trace results in order to have flexible policy on the trace information received by different administrative domains. The authors discussed how in different scenarios, X-Trace can detect location of the faults efficiently. They also implemented X-Trace in three different settings to investigate its performance under different circumstances. Pros: 1. X-Trace is capable of collecting traces in wide range of different settings with different applications. 2. It can detect faults on any layer of the protocol. 3. A complete view of the network is obtained only by small amount of metadata. Cons: 1. It requires modifying all layers of the existing networks and protocols in order to carry metadata in the network. 2. It is not clear how a partially reconstructed task tree can be used to detect some kind of faults. 3. It has privacy problems since the trace result is sent out-of-band. Paper 2: WiDS Checker: Combating Bugs in Distributed Systems This paper presents a method, called WiDS checker, for reproducing distributed system runs and simulating systems for finding bugs before they happen in a real scenario. It is well known that it is difficult to debug distributed systems, since it is hard to know the exact global state. WiDS operates in three steps: logging, replaying and checking for predicates. The log of different events of the system are organized according to the Lamport's logical clock ordering and replayed to the overall system running inside the simulator. As a result of replaying, the exact application memory is determined. Replaying is done in an efficient way by emulating all replay instances inside one process to reduce inter-instances. In WiDS, checker is composed of the following functionalities: Reproducing real runs, checking user-defined predicates, removing false alarms and tracing root causes. The performance of WiDS is evaluated through finding bugs in a real environment. Pros: 1. It designs a method for checking distributed systems in which the task of checking is difficult because of lack of a central authority. 2. It is implemented in real scenarios and it performs well in finding bugs. 3. WiDS can be simply added to the existing debuggers of distributed systems. Cons: 1. Why the replaying should be necessarily deterministic. In many real systems, a lot of random parameters are presented and random replaying might improve the performance of checking method. 2. It is interesting to analyze how much overhead is needed for checking. It seems that high overhead is introduced by WiDS. 3. In a large distributed system, having more than one replay process might be better to reduce communication overhead. Comparison of single versus multiple replay process is not presented.