From: Rini Kaushik [rinikaushik@yahoo.com] Sent: Tuesday, April 06, 2010 12:17 PM To: Gupta, Indranil Subject: 525 review 04/06 Rini Kaushik CS 525 The paper proposes PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed causally by a correct node are eventually detected. PeerReview requires that every node maintains a non-tamperable record of all the messages sent and received by a node. Each node consists of a state machine and a detector module which are used to detect anamolous behavior. Pros: 1) PeerReview makes an attempt to handle byzantine faults as opposed to complete fail-stop faults. Cons: 1) Requiring the correct node to be deterministic may not be feasible. Also, the replay by the detection module is assumed to be deterministic in nature which is not a realistic assumption given the advent of multi-cores in today's computers. Also, to have a consistent replay, the recording should include the results of system calls, signals received etc. and simply recording the input and output of the application may not be sufficient. 2) PeerReview requires that the correct node is eventually causally affected by the error. On one hand, this assumption limits the applicability of PeerReview. Also, multiple errors may happen in the meantime and the original error may even get masked. Hence, there is a chance of false-negatives. 3) what is the logging overhead, in terms of space and performance of maintaining a tamper-evident record that provides non-repudiable evidence of all nodes' actions? The authors mention that the network traffic is high; however, it would be good to see performance overhead as well. 4) My understanding is that the byzantine fault tolerance should be able to handle arbitrary faults. If a node has intermittent disk errors or returns wrong error codes or doesn't even return an error code, how can the correct node identify the issue? Correctly identifying the error would require checksums of the disk blocks etc. 5) The system makes too many simplifying assumptions and that makes me question its applicability in a real-world deployment. 6) Scalability of the system is a concern as the log will grow linearly with the number of the nodes. From: Vivek [vivek112@gmail.com] Sent: Tuesday, April 06, 2010 12:01 PM To: Gupta, Indranil Subject: 525 review 04/06 Byzantine Generals Problem Core Idea: This paper presents the Byzantine Generals Problem, which helps to define the general problems and issues of fault-tolerance for computer systems. The Byzantine generals Problem is posed as follows: An army must collectively decide whether or not to attack a city that they have surrounded. The army is broken into divisions and each division is led by a general. The generals can communicate with each other through some messenger if they observe the enemy. The key issue is that some of the Byzantine generals can be traitors, and can disobey the orders sent by a general. Even in lieu of this, the army must still conduct the correct plan and be able to tolerate traitors. This papers shows that the army will reach consensus and "do the right thing" if no more than 1/3 of the generals are traitors. The paper provides an intuitive reasoning behind this for a three generals problem, and then proves this result for any number of generals. These results tell us how many faults a system can handle before the system delivers the wrong result. Resiliency is a metric of how fault-tolerant a system is: Pros: - The results obtained no doubt are seminal, and make a very clear statement about reliability guarantees of a computer system. - The theoretical bounds are very useful for systems today, and can be associated with both shared memory multi-cores as well as large-scale distributed systems. - The problem statement is very clear, and explains the issue of unreliability of an entire system, given the unreliability of just a few components of that system. The system can be an army or a computer system. Cons: - The paper does not seem to consider algorithms/applications running on these systems. Certain algorithms can break down completely if just one process fails (studies in resilience examine this). - The proof and theory is not well connected with practical systems. There is one section on the applicability to real systems, but it could definitely be more clear. -What is the broader impact? Surely, the work is motivated by the need for reliability of computer systems, but why is this important? - Some note is made of the expense of achieving such fault-tolerance. In the last paragraph of the paper, this issue is addressed. However, the authors say that "when extremely high reliability is required...the Byzantine generals solution is needed" . What systems need extremely high reliability? How do you know when high reliability is required and when it is not? The paper should address some of these questions more thoroughly. From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Tuesday, April 06, 2010 11:11 AM To: Gupta, Indranil Subject: 525 review 04/06 CS525 reviewed on Byzantine Fault tolerance Liangliang Cao (cao4@illinois.edu) April 6, 2010 Paper 1: A. Clement et al, UpRight Cluster Services, SOSP 2009. This paper builds a library called UpRight, which aims to minimize engineering efforts in changing crash fault tolerant (CFT) system to Byzantine fault tolerance (BFT) system, which maintaining the competitive properties such as performance, hardware overheads, and availability. The paper provides BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Pros: • Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary concern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness. • The implementation of UpRight provides an elegant abstraction replication interface with agreement protocol which makes it easy to convert CFT systems to BFT ones. • The implementation of UpRight Zookeeper shows that it is no longer tedious work to change CFT to BFT systems. • The implementation of UpRight HDFS is exciting since the original design of HDFS focuses on DataNode, but does not provide fault tolerance for NameNode. I believe this work has great potentials. Cons • Despite the robustness, BFT systems do consume more CPU cycles than CFT systems due to the overheads of computing cryptographic digests on message. It might be interesting to provide a more flexible interface where the user can choose to use BFT or CFT anytime. For example, when the Byzantine failure assumption is not valid, the user can simply use the CFT system for better efficiency. • The case study on HDFS is carried out on 50 data nodes and might not be convincing for large scale system. It would be interesting to see how the system works in general cloud computing services. Paper 2: Haeberlen, Kuznetsov, and Druschel, PeerReview: Practical Accountability for Distributed Systems, SOSP 2007. This paper develops PeerReview system to provide accountability in distributed systems. PeerReview handles Byzantine faults whose effects are detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. The key idea of PeerReview is to maintain a secure record of the messages sent and received by each node, which is used to automatically detect when a node’s behavior deviates from that of a given reference implementation, thus exposing faulty nodes. Pros: • The analysis of faulty behavior in Section 3 is very insightful. • The requirements of PeerReview (node’s actions are deterministic, nodes can sign messages, each node is periodically checked by a correct node) are easy to satisfy. The applicability of PeerReview is verified by three examples of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.. Cons: • The design of security record + comparing record against a reference implementation might not be optimal. There are other choices of detecting faults instead of comparison, and it will be interesting to explore which detection scheme works best. • PeerReview consumes much more CPU and disk space than the original system. Although the authors argue that the computational burden can be alleviated by multi-core system, still such extra overhead will be a problem preventing PeerReview from popular use. From: Giang Nguyen [nguyen59@illinois.edu] Sent: Tuesday, April 06, 2010 10:47 AM To: Gupta, Indranil Subject: 525 review 04/06 CS525 review nguyen59 04/06/10 PeerReview: practical accountability for distributed systems Nodes in distributed systems can fail for many reasons. This paper considers the use of accountability to detect and expose faulty/misbehaving nodes in the system. Each node is modelled as a state machine S, a detector module D, and an application A. The state machine represents all the functionality that should be checked by PeerReview, whereas the application represents other functions that need not be checked, e.g. a GUI. The detector module D implements PeerReview; it can observe all inputs and outputs of the state machine S, and it can communicate with the detector modules on other nodes. Some of the assumptions are that the state machines S are deterministic, security assumptions (each node has a public-private key pair, secure hash function), each node has a reference implementation (of all state machines) that can take a snapshot of its state and can be initialized according to a given snapshot, and each node has a set of witnesses nodes, at least one of which is a correct node. Each node maintains a secure tamper-evident logs of its inputs and outputs. This log can be requested by other nodes. With a commitment protocol, either end of a message obtains verifiable evidence that the other end has logged the transmission. The audit protocol ensures that, for each node i, either i's actions are consistent with the reference implementation of i's state machine, or i is exposed by at least one correct witness. The challenge/response protocol ensures that, if a node i fails to respond to a challenge or does not acknowledge a message, it is eventually suspected by at least one correct witness. The suspicion persists until the node answers the challenges or acknowledges the message. The evidence transfer protocol ensures that all correct nodes eventually output a failure indication for each faulty node. Pros: - For the multicast application, able to discourage freeloaders because they actually get worse performance. - General and is applicable to many different applications. Cons: - Signing of messages significantly increases message latency. - Lower throughput due to transfers of additional protocol messages to witnesses etc. From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Tuesday, April 06, 2010 10:29 AM To: Gupta, Indranil Subject: 525 review 04/06 525 review 03/09 Hyun Duk Kim (hkim277) * The Byzantine Generals problem, L. Lamport et al, TOPLAS 1982 This paper is about Byzantine General Problems. For reliable computing, we need to handle malfunctioning component which gives conflicting information. This situation can be expressed as Byzantine army and their messenger based communication. The goal is reaching agreement among loyal generals. Basically, this problem is solvable if and only if more than two-thirds of the generals are loyal. Authors introduce definitions, assumptions and solutions of the problem. Authors further explore problem with different setting and assumptions. With unforgeable written message system, the problem is solvable for any number of generals and possible traitors. Authors also show how solvable conditions are changes when all the objects are directly connected. At the end, authors shows how this problem can be applied to reliable computing systems. This paper introduced an important concept with interesting analogy. This is the very classic paper which presented the concept of the Byzantine general problem. Computer malfunction or security leak problems are well formed by the relation among Byzantine generals and messengers. This analogy attracts readers' attention and help to understand easily. This paper shows formal proofs with various assumptions. From the beginning, the problem starts with strict assumptions. Later authors explore problems with different conditions such as signed message or missing directed path. This various exploration increases the value of the paper. However, Byzantine general problem is pretty fixed, and there are many situations it cannot cover. For example, there may be messenger interception. Just message may be changed. Or we may think situation like traitor infection. Like virus, traitor may bribe other loyal general and make them traitors, too. In this case, problem should be solved within limited time before the number of traitor go over the limit of problem solvable situation. Although this may be out of scope of the given paper, it would be interesting to connect these kinds of problem to Byzantine situation. * UpRight Cluster Services, A. Clement et al, SOSP 2009 This paper presents UpRight library for the Byzantine fault tolerant (BFT) system implementation. UpRight library is located between application client and server. By ensuring ordering and non-missing of request and response, UpRight provides fault tolerance. To show the usefulness of proposed library, authors implemented UpRight based BFT Zookeeper and BFT Hadoop Distributed File System (HDFS). According to experiment results, both implementations showed comparable performance with fault tolerance. Authors showed actual implementation using UpRight and showed its practice usefulness. UpRight solved HDFS single point failure. HDFS is a very popular tool for distributed computing. Although HDFS supports fault tolerance for data nodes, it could not solve the failure of name node. If we know how HDFS work, we can easily think that we may prepare replica of name node. UpRight realize this idea to make replica of name node. Experiment results shows that BFT HDFS was not interrupted name node failure. This can HDFS more reliable framework. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Tuesday, April 06, 2010 10:05 AM To: Gupta, Indranil Subject: 525 review 04/06 PeerReview: Practical Accountability for Distributed Systems ------------------------------------------------------------ Summary --------------- In this paper, the authors present PeerReview that provides for accountability in distributed nodes. PeerReview enables a faulty node to be eventually detected and allows for other non-faulty nodes to be able to defend their correctness. For this purpose, PeerReview creates a per-node log which records all messages sent and received by a node, and inputs and outputs of the application. Each node also has a set of witnesses. Besides, it is assumed that the system being used can be viewed as a deterministic state machine, such that, given a state, the machine's next and past states can be rebuilt. PeerReview makes sure that the logs are tamper-evident. The commitment protocol of PeerReview ensures that the sender of a message obtains evidence that receivers of the message have logged the message, and vice versa. The protocol also guarantees that every node maintains a single linear log. Witnesses periodically ask a node for its log entries and check via a replay of a particular snapshot of its log to check if the node's actions are consistent with the reference implementation of the node's state machine. If not, the node is exposed. In case one node finds another node misbehaving, the node is declared suspected and creates a challenge for it, which is forwarded to its witnesses. The witnesses also declare a node as suspected if they do not get a reply from the node. The suspicion persists until the node answers the challenge. In the evidence transfer protocol, a node finally decides whether another node can be trusted or should be exposed, by periodically fetching challenges collected by the witnesses of the node. By relaxing the guarantees used in the protocol to probabilistic ones, a scalability of the order of log N can be achieved. The authors also demonstrate how the protocols can be used in various distributed systems and evaluate their systems in various scenarios. Pros ----- -- The authors have given due consideration to all causes of concern in the protocol being implemented. They consider all pros and cons of their system and point out possible workarounds for any disadvantageous features. -- The fact that a node takes the help of several witness nodes to determine the correctness of another node does away with the problem of network and processing delays that one link may be suffering from in an asynchronous system. Cons ---- -- The system is scalable only with relaxed probabilistic guarantees. -- PeerReview only provides for limited fault detectability in the system. It is an accountability system that helps identify the nodes that are misbehaving in a system. However, as the evaluations show, it is quite costly in terms of system resources. Thanks, Kurchi Subhra Hazra Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Tuesday, April 06, 2010 8:51 AM To: Gupta, Indranil Subject: 525 review 04/06 Review for “PeerReview: Practical Accountability for Distributed Systems” PeerReview presents a system architecture that ensures that nodes are made accountable for their actions and allows correct nodes in the system to detect bad nodes whose misbehavior can be observed through the messages they receive. The application which is to be made accountable runs over a detection module which keeps track of all the messages that are received and sent by the application running above and logs them securely in a non-repudiable append-only log, maintained by each node. This log is presented by the detection module when another node requires to verify the actions taken by this node and this log serves as the proof for the actions of the node. PeerReview assumes that the state machine of the application is deterministic and thus, replaying the messages in the log would allow it deduce whether the application is behaving correctly or not. A reference implementation of the application at each node is maintained which is used to check for misbehavior. To ensure that every message received by a node is logged, PeerReview uses a commitment protocol which requires the receiver to send an authenticator to the sender containing the log entry for the message. Each log entry contains the message, sequence number and type of the message associated with a recursively defined hash value and authenticator making the log tamper-evident. Thus, PeerReview guarantees that eventually every bad node is detected and no node is falsely accused. Pros: - It provides guarantees for eventual detection of faulty nodes and ensures that no correct node is falsely accused. - While the deterministic version of PeerReview does not scale for larger systems (message complexity of O(n^2)), it’ provides probabilistic guarantees which ensures that a bad node is detected with high probability and the message complexity of the system scales only O(log N). - It requires no difficult modeling of the protocol under question and thus can be done in an application independent manner. - Provides a heuristic to decrease log sizes which can turn out to be quite large if the system is used naively. Cons/Comments - It assumes that the applications are deterministic. Although, this assumption is not true in practice, it might not be so in general esp. if the algorithm/protocol itself is random. Although, the authors say random values can be regenerated by configuring the seed for them, it does not provide a convincing argument that this captures all sources of randomness associated with the program. - Although PeerReview can be used as a mechanism to detect malfunctioning/bad nodes, it cannot do anything to detect the bugs in the system. Thus it only provides something like “syntactic” detection but not a “semantic” detection which could presumably be obtained from a model of the protocol. - PeerReview does not talk about the case in which nodes perform their functions properly but intercept/tamper messages not addressed to them but routed through them. - It requires every node to have an unique identifier, but does not discuss any mechanism to do so. Using simple mechanism like IP address, MAC addresses might not work since these can be reconfigured easily. This might be hard to do in software and require out-of-band/hardware techniques to be implemented. From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Tuesday, April 06, 2010 1:32 AM To: Gupta, Indranil Subject: 525 review 04/06 Paper 1: The Byzantine Generals Problem This paper presents an impossibility result regarding making an agreement in a distributed system containing non-crash failures. The problem is expressed in terms of a group of generals of the Byzantine army that one or more of them are traitors and are going to agree upon a common battle plan. The goal is to propose an algorithm that guarantees (i) all loyal generals decide upon the same plan of action, and (ii) a small number of traitors cannot cause the loyal generals to adopt a bad plan. The problem has been investigated under various hypotheses and several solutions have been proposed. A brief summary of the results and discussion point follows. There is no solution that achieves agreement among 3m generals communicating through oral messages if m or more of the generals are traitors. They reduce the problem of achieving agreement among three generals, one of which is not loyal, to this problem through simulating three generals, each of which representing m generals in the original problem. By the way of contradiction and based on their previously proposed impossibility result for the aforementioned case, they drive the proof for the current case. They also design an algorithm for achieving agreement in situations which less than one third of the generals are traitors. When generals communicate with signed messages the previous impossibility result does not hold. For this case, they have proposed a solution that for any m solves the Byzantine Generals Problem if there are at most m traitors. The situation when communication paths are not complete and every general cannot send messages directly to every other general has also been considered. They present an algorithm that works for the case of p-regular communication graphs. The algorithm solves the problem for any m > 0 and any p ? 3m, if there are at most m traitors. The paper is very well explained and different aspects of the problem have been investigated and it serves as one of the fundamental impossibility results in computer science. It is worth extending the work to derive a (probable) exact impossibility result regarding the expense of the algorithms solving Byzantine consensus, with respect to time and message complexity trade-off. It has been shown that the algorithms subjected to coping with Byzantine failures are expensive, one improves the time and loses the message complexity while the other prefers the message complexity and ignores the time. The question is whether or not a formal proof regarding this sort of trade-off can be provided. Paper 2: PeerReview: Practical Accountability for Distributed Systems The authors propose PeerReview to provide accountability and fault detection in distributed systems. PeerReview creates a secure record of all nodes’ actions and then inspects the recorded information and detects faulty nodes. It guarantees to eventually detect all Byzantine faults whose misbehavior is detectable by a non-faulty node. More specifically, the faulty nodes that are discovered are either detectably faulty, which break the protocol in a way that causally affects a correct node, or detectably ignorant, which never acknowledges that it received a message sent by a correct node. Some node i is exposed by another node j if j has a proof of i’s misbehavior, and node i is suspected by node j if, according to j, i has not acknowledged a certain message sent to it. When node j learns that i has accepted the message, node j withdraws the suspected indication. First, they propose a simplified version, called FullView, which satisfies completeness and accuracy. However, it is based on the assumption of the existence of a centralized trusted entity in the distributed system; even so, its message, storage, and computation complexities are quadratic in the number of nodes. They relax the assumptions in PeerReview, however, it is also based on using a tamper-proof hardware in each node. They have applied their approach to three sample applications and presented the results. Its applicability in practice is valuable. The correctness of the PeerReview is dependent on the tamper-evident hardware whose existence in all nodes is not guaranteed and even realistic in practice. I wonder if the storage required for logging does not make it expensive. In addition, the traffic it imposes on the network is not negligible. From: arod99@gmail.com on behalf of Wucherl Yoo [wyoo5@illinois.edu] Sent: Monday, April 05, 2010 10:42 PM To: Gupta, Indranil Subject: 525 Review 4/6 In Byzantium, Wucherl Yoo (wyoo5) The Byzantine Generals problem, L. Lamport et al, TOPLAS 1982 Summary: This paper presents Byzantine problem and give several intuitive solutions under various environments. The Byzantine problem is different failure model compared with fail-stop model: the faulty components (generals) can send conflicting information to other non-faulty (loyal) components so that the consensus can be more difficult to be reached. With ‘m’ Byzantine failures, the proposed algorithm oral messaging (OM){m} solves the problem when the entities are more than 3m + 1. The signed messaging (SM) {m} solves the problem with cascaded singed messages from relaying entities. This authentication property of SM(m) can help to solve any number of “m” Byzantine failures. OM(m) and SM(m) requires all connectivity from entities. For missing communication paths, the authors also extend these solutions. When the graph G is p- regular (every node has a set of p distinct neighbor nodes), OM (m,p) solves the problem for a general to send message to p selected lieutenants (m >=1 and p >=3). SM (m+d-1) also solves the problem when the subgraph of loyal generals ahs diameter d (for any m and d). The diameter of a graph is the smallest number d: a path connects any two nodes where the arcs of the path are at most d. Pros: 1. The authors provide clear problem definition (novel failure model) and solutions with various situations. 2. The theoretically proved solution of this paper is crucial to implement distributed systems with Byzantine tolerance Cons: 1. Questionable practicality since messaging overhead for Byzantine tolerance is huge. Most of distributed systems would ignore the Byzantine failures 2. Since the absence of a message needs to be detected, it would be difficult to apply the algorithms to asynchronous systems. 3. Finding the appropriate number of Byzantine failures can be difficult. What would be the expected number of Byzantine failures with given number of entities in a distributed system? -Wucherl From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Monday, April 05, 2010 10:09 PM To: Gupta, Indranil Subject: 525 review 04/06 ===================================================================== Paper 1: PeerReview: Practical Accountability for Distributed Systems ===================================================================== Finding a fault in a large distributed system is always a daunting task. There are many faults beyond mere 'fail-stop'. Those faults may arise from different reasons such as hardware malfunctions, misconfigurations, software modifications by users, hacker attacks, and so on. If there are thousands of nodes admistered by numerous administrators in a large distributed system (e.g. an international banking system), it is really difficult to detect the fault, to identify the faulty nodes, and to convince others that a node is faulty. In the paper titled "PeerReview: Practical Accountability for Distributed Systems", the authors have proposed a solution termed as "PeerReview", motivated from the offline world, to answer those questions. PeerReview is a general and practical approach that provides accountability in a distributed system. PeerReview made several assumptions such as system can be modeled as collection of deterministic state machines, nodes have reference implementations of the state machines, correct nodes can eventually communicate, and nodes can sign messages. The basic idea of PeerReview is very simple. All nodes mainatain logs for their input and output along with all message passing. Each node also has a set of witness nodes which audits its log periodically. If the witness nodes find any misbehavior after auditing, they generate evidence and make it availabe to other nodes. The other nodes then check the evidence and report the fault. By using hash chain, PeerReview can detect any tempering in the log entries of a particualr node. PeerReview guarantees that the faults, if there is any, will be detected and no good node will be accused. The authors finally evaluated PeerReview in three different applications namely NFS server in the Linux kernel, overlay multicast, and P2P e-mail. Their evaluation mainly shows the applicability of PeerReview in different real life distributed systems. Pros: (1) The idea is very clever and intutitve. (2) It guarantess completeness and accuracy. (3) PeerReview doesn't require any formal specification of the system to detect Byzantine fault. Cons/Discussion Points: (1) I don't think PeerReview is scalable. (2) Is it possible to guarantee that the witness node behaves correctly? (3) How will PeerReview behave when the distributed system is characterized with high churn rate? (4) Is PeerReview applicable in sensor network? (5) PeerReview can detect a fault only that fault cause any problem to a correct node. It means that, PeerReview is an comprehensive fault detection solution. (6) PeerReview is applicable to those systems whose actions are deterministic. (7) What is the best way to determine how many witness is appropriate for a typical large distributed system? Is there any good was to select the witness nodes? ====================================================================================== Paper 2: UpRight Cluster Services ====================================================================================== Byzantine Fault is a well-known fault in research domain and industry for a long period of time, where the nodes behave arbitrarily. Many Researchers put major efforts and proposed several solutions to address this fault. However, surprisingly, these solutions are not widely deployed in industry, despite those faults are not uncommon. One of the major why Byzantine Fault Tolerant (BFT) system is not largely deployed is its lack of viability in terms of engineering effort. At present, if a system needs to be BFT, it needs to be rewritten from the scratch, a major turn off to deploy it. In the paper titled "UpRight Cluster Services", the authors proposed Upright library to make BFT a viable addition to Crash Fault Tolerance (CFT) for a range of cluster services. The authors cleverly composed prior ideas to build UpRight system. The authors evaluated UpRight system by constructing BFT versions of Zookeeper lock service and the Hadoop Distributed File System (HDFS). Although the primary goal of UpRight is to make it viable to be adopted by existing application and performance was its secondary concern, the authors showed that, their case studies with Zookeeper and HDFS provide competitive performance with the originals while providing additional robustness using UpRight system. To use UpRight, an application developer only needs to know the interface provided by UpRight Library without knowing the details of fault tolerance or replica coordination. UpRight follows a basic client-server architecture (client issues a request and expects a response from server). For this purpose, UpRight applications must account for: Nondeterminism, Multithreading, Read only replies (a client shim sends read-only requests directly to the server shims and server shims execute them without ordering them in global sequence of requests). UpRight server shim periodically tells the server application to checkpoint its state to stable storage. The authors mentioned some checkpoint strategies such as hybrid checkpoint/delta approach, stop an copy, helper Process, and copy on write. The core of UpRight system is a Byzantine agreement protocol. In UpRight agreement, a client requests involve 3 modules: Request Quorum, Order Module, and Execution Module. Request Quorum stores client's request, forwards a digest of the requests to other order module, and supplies full requests to execution module. Order module produces ordered batch sequence of request digests. Execution Module represents application server which executes requests from ordered batches and produce replies. Pros: (1) This paper nicely shows how to make BFT a simple and alternative to CFT. (2) The authors cleverly exploited previous research ideas to solve their problem. (3) Their evaluation was done with real-life system (Zookeeper and HDFS) rather than using simulator. Cons/Discussion Points: (1) Scalability is a major issue of UpRight cluster services, especially it is prominent from figure 6. (2) In section 2, the authors mentioned that, existing Byzantine Fault model can tolerate t Byzantine failures, while UpRight can tolerate larger number (u). How large r as oppose to t is? The authors didn't give any comparison for that. (3) The authors mentioned in one place that their goal is to make BFT to a viable "addition" to CFT while in another place they mentioned to make a viable "alternative". ====================================================================================== Paper 3: The Byzantine Generals problem ====================================================================================== The Byzantine General problem is one of the most celebrated and classical problem in distributed system for around three decades. Interestingly, while I looked at Lamport's (the main authors of this paper) webpage about this paper, it seems very funny how they came up with the name of this problem. Byzantine General Problem can be abstracted through a battle field analogy as follows: a group of generals (one of them is the commander) gathers around the enemy's place. Among these generals, one or more than one, are traitors. The goal of Byzantine General Problem is to reach a consensus by loyal generals whether they should attack the enemy place or they should retreat, irrespective of what the traitor generals say. In the paper, the authors proposed solutions how to reach a consensus in such situation. Such solution can easily be extended to any distributed system, showing the similar characteristics. The authors proposed couple of solutions. In first case, it is assumed that, the generals can communicate with each other through oral messages. The authors showed that, in this case, Byzantine General Problem is solvable if and only if more than two thirds of the generals are loyal. The second approach assumes written and unforgeable messages. The authors showed that, in this case, the problem is solvable for any number of generals and possible traitors. Pros: (1) The authors nicely showed the theoretical lower bound of number of generals. (2) Signed message algorithm is insensitive of message communication failure. Cons / Discussion Points: (1) Are the assumptions made in this paper realistic? (2) Many distributed systems enjoy a very high churn rate, where it is hard to know the number of nodes a priori. In that case, it is not guaranteed to say whether a system is robust to byzantine fault or not. (3) What will be the impact of man-in-middle attack? (4) Proposed solutions suffer from high time and message complexity. (5) Is it possible to distinguish between failure of communication and failure of nodes? From: gh.hosseinabadi@gmail.com on behalf of Ghazale Hosseinabadi [ghossei2@illinois.edu] Sent: Monday, April 05, 2010 7:38 PM To: Gupta, Indranil Subject: 525 review 04/06 The Byzantine Generals Problem In this paper, the classical problem of agreement in the presence of Byzantine failure is considered. Byzantine failure is a failure model in which the faulty node might misbehave by sending conflicting messages to different nodes of the system. The problem to be solved is as follows: We consider a set of generals. Each general is either loyal or traitor. In the agreement problem the goal is that all loyal generals decide on a common plan. In this setting, traitors can be considered as Byzantine faulty nodes of the system, while loyal generals are fault-free nodes. We assume that out of total of n generals, one of them is the commanding one and the other n-1 are lieutenant generals. A commanding general sends an order to lieutenant generals. In the Byzantine general problem, we require that: 1. All loyal lieutenant generals obey the same order. 2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. This paper proves that the above mentioned agreement is impossible if 1/3 or more of the generals are traitors. In other words, it proves that if m traitors are present, agreement is possible only if total number of generals is 3m+1 or more. This is a necessary condition for achieving the agreement. An algorithm for achieving the agreement, if the necessary condition is hold, is also presented, which makes the necessary condition to be sufficient as well. The type of exchanged messages are such that 1. Every sent message is delivered correctly (no communication error). 2. The receiver of a message knows who sent it. 3. The absence of a message can be detected. Their proposed algorithm is as follows: 1. The commander sends his value to every lieutenant. 2. Each lieutenant sends the value it has received from the commander to n-2 other lieutenants 3. Each lieutenant takes the majority between the value it received from the commander and other lieutenants. The proof of correctness of this algorithm is presented in the paper. Another algorithm for achieving the agreement is presented. This algorithm is under the assumption that traitors are not able to forge commander’s order, because the commander signs its order before sending. The proof of correctness is also described. The paper then considers the case where all communication paths are not reliable and commander’s order is not necessarily delivered to all lieutenants. An algorithm that achieves the agreement in this situation is also presented. The proof considers the properties of the regular graphs. Pros:This paper considers the classical problem of agreement in the presence of Byzantine faults. It presents necessary and sufficient conditions for achieving the agreement. Cons:In case of not completely connected graph of generals, necessary condition for agreement is not presented. Their presented algorithm depends on the structure of the graph and this might not be the optimal solution. UpRight Cluster Services This paper presents a library for Byzantine fault tolerant replication, called UpRight. More precisely it considers two previously designed crash fault tolerant systems (Zookeeper coordination service and the Hadoop Distributed File system (HDFS)) and presents a solution to be not only crash tolerant but Byzantine fault tolerant (BFT). The presented solutions are called UpRight-Zookeeper and UpRight-HDFS. UpRight is designed such that : 1)An UpRight system is safe despite r commission failures and any number of omission failures. 2)An UpRight system is safe and eventually live during sufficiently long synchronous intervals when there are at most u failures of which at most r are commission failures and the rest are omission failures. In UpRight, application client communicates with three other modules called RQ (Request quorum), order and execution. An Up-Right client deposits its request at a request quorum (RQ). RQ stores the request, forwards a digest of the request to the order module, and the full request to the execution module. Two main functionalities of RQ that results in optimizing the performance of UpRight are: validating requests and separating the data path from the control path. The order module produces a totally ordered list of request digests. The execution module executes requests from the ordered lists and produces replies. Since order and execution modules are separated in the design of UpRight, messgas are exchanged between the two modules to ensure correct coordination. UpRight needs 2u+r +1 nodes for the RQ module, 2u+r +1 nodes for the order module and u + r + 1 nodes for the execution module. If no failure is present in the system, UpRight requires u + r + 1 nodes for RQ module, 2u + r + 1 nodes for the order module and u + 1 execution nodes. The performance of UpRight- Zookeeper and UpRight-HDFS are evaluated through simulations. Pros: 1. This paper designs a Byzantine fault tolerant protocol for cluster services. Previous solutions considered only crash failure model and the designed framework of this paper works the general category of Byzantine failures. The designed solution is modular. Optimization methods for improving the functionality of each module are presented. 2. UpRight uses fewer number of nodes when fault is not present. More replicas are used only if presence of fault is observed. This increases the optimality of UpRight. So in the case of no failure, no extra overhead exists. Cons: 1. It is not clear how Byzantine fault might happen in a cluster or how important this issue is. Data centers are usually managed by an authenticated entity and attacking them is not a common event. On the other hand, crash failure is a more important issue in data centers since it is highly possible that a node stops operation. But malicious behavior is not usually present in data centers. 2. In a system with r commission failures and u-r omission failures, optimal number of nodes needed to tolerate all faults are 3(u-r)+r+1=3u-2r+1. UpRight uses much more nodes than the optimal, mainly because of modular design. 3. UpRight is designed to be able to work with the existing solutions for crach failure. It might achieve much better performance if designed from scratch. Why it should be compatible with previous solutions is not clear since it is itself able to handle only crash failures. From: ntkach2@illinois.edu Sent: Monday, April 05, 2010 7:19 PM To: Gupta, Indranil Subject: 525 review 04/06 Nadia Tkach – ntkach2 CS525 – paper review 4/6 In Byzantium Paper 1: UpRight Cluster Services The paper proposes a new UpRight library for fault tolerance replication that is further used to develop a Byzantine Fault Tolerance (BFT) model as a more robust substitute for Crash Fault Tolerance (CFT). The authors investigate the performance of the new model on open source systems Zookeeper and Hadoop Distributed File Systems (HDFS) and create UpRight-Zookeeper and UpRight-HDFS systems. Pros: • Improved robustness of the systems • Minimal intrusiveness to existing applications Cons: • While the overall performance of the new systems is comparable to originals, it still computationally consume more CPU cycles per same work load • Use of MAC authenticators instead of PKI keys/signatures Paper 2: PeerReview: Practical Accountability for Distributed Systems The paper describes a new system for detecting node faults in the network. PeerReview is an accountability system designed for distributed networks and as experimental evaluation shows can be used widely in network filesystems, peer-to-peer email, overlay multicast and a number of other distributed systems. It is works by storing the records of messages sent and received, and inputs and outputs of applications. The nodes regularly request the log files from other nodes and compare the records. When the deviations found, the other node is considered faulty. The same way, if the node doesn’t respond to the requests over time it is initially suspected and eventually exposed as faulty as well. Pros: • System accountability for all nodes on the network • Satisfies completeness and accuracy • Identifies race condition and bugs in the system • Protocol conformance validation Cons: • Processing and message overhead depending on fault assumptions and established probabilistic detection guarantee • As a consequences the system has scalability limitations • Memory requirements for storing history of messages and application processing, and network traffic due to log transmission