From: Tengfei Mu [tengfeimu SpamElide] Sent: Thursday, March 03, 2011 12:31 PM To: Gupta, Indranil Cc: indy SpamElide Subject: 525 review 03/03 1. Cumulus: filesystem backup to the cloud This paper proposes Cumulus, a cloud-based backup system which is designed around a minimal interface thin cloud. It assumes that all the application logic are implemented solely by the client. The core algorithm is based on packing smaller files into data objects segments and creating those segment snapshots. Every snapshot descriptor is composed of a timestamp and a pointer to the segments it includes. As for implementation, it supports only write-once operations, allowing no modifications, but permitting deletion and re-use of memory space. And the authors shown that thin-cloud solutions can be cost-effective alternative for the thick-clouds that are designed for specific application. Pros: 1. Cumulus is cost-effective backup on the cloud storage 2. Cumulus is efficient because it packs smaller files into segments as data blocks or units of storage 3. Cumulus simplifies the garbage collection due to its usage of snapshot descriptors Cons: 1. Cumulus has increasing overhead for the frequently written data. 2. Cumulus didn’t consider about how to support the failure of the backup site. 2. Towards robust distributed system During this keynote speech, Prof. Brewer discusses some problems of current distributed systems, especially about the robustness. He argued that classic distributed systems are fragile, although they are widely adopted and popular. Specifically, he list 3 issues that why Inktomi does not rely on classic distributed system research. (1) The problem of state: most classical distributed system research concentrate on the computation instead of the data, but he argues that distributed computing is data-intensive in nature. (2) Consistency vs availability: tradeoff between consistency, availability, and tolerance to network partitions is introduced. (3) The boundary problem: How the interface between computational entities should be. Pro: 1. The talk explains the main problem lies in the design of traditional distributed systems, and had great impact on the following distributed system design. 2. The author points out reasoning random failures using probabilistically method. Con: As for the availability problem, duplication should the most straightforward approach. From: Igor Svecs [isvecs SpamElide] Sent: Thursday, March 03, 2011 12:27 PM To: Gupta, Indranil Subject: 525 review 03/03 Igors Svecs CS 525 2011-03-03 Cumulus: Filesystem Backup to the Cloud SUMMARY In this paper, the authors explore tradeoffs in a thin-cloud (simple interface, more portable) vs. thick-cloud (more integration with application, more efficient) cloud model by building a filesystem backup application on top of Amazon's S3. Small files are grouped in segments to reduce overhead. Cumulus assumes that backup client will keep a significant local state for optimization purposes, and the storage cloud will be completely content-agnostic. COMMENTS Not being optimized to track the history of individual files presents a significant disadvantage in many scienarios where a single file is accidentally deleted and needs to be restored. It would also be interesting to see backup software that can assume both thin-cloud and thick-cloud models. It would check if the cloud supports "thick" operations, and possible fall back to the thin mode. The authors could have explored the concept of storing state on the cloud for redundancy - how large the state needs to be; how often it is accessed, etc. Thin cloud / thick client approach is also less useful in enterprise scenarios, where multiple machines need to be backed up. This is typically done by having a backup server and multiple backup agents on the clients. In this case, backup server would need to keep track of state of all the clients. Using pricing for evaluation is a reasonable, but not particularly innovative idea. Moreover, it will depend on service provider. Towards Robust Distributed Systems SUMMARY / COMMENTS This set of slides first talks about distributed system in a business/datacenter setting. They mention that modern DS should focus on data rather than computation, which agrees with current cloud computing research (and these slides are from 2000; long before the concept of cloud computing). It includes a discussion of the CAP theorem that talks about tradeoffs between consistency, availability, and tolerance to network partitions. It appears from recent developments (such as Dynamo) that BASE is indeed taking over ACID, but consistency will always be important in reliable database transactions. They argue that root cause of RPC problems is that they emulate local procedure calls. However, this greatly simplifies development and generality of code especially in distributed systems. Perhaps we need a better exception mechanism to overcome this problem. After reading the entire set, I can say that the speaker predicted many properties of emerging cloud systems – problems are I/O bound, but CPU bound, high reliability (zero downtime allowed), built-in high tolerance (software will be buggy, hardware will fail, so we have to deal with it). From: Chi-Yao Hong [cyhong1128 SpamElide] Sent: Thursday, March 03, 2011 12:27 PM To: Gupta, Indranil Subject: 525 review 03/03 ---- Cumulus: Filesystem Backup to the Cloud, FAST’09---- This paper proposed a backup storage system that aggregates the data from user machines over Internet. Backup is an increasingly interesting application, especially for thin-cloud services. For end users, Cumulus is a simple solution that could provide reliable backup services. To minimize the capital cost of data center storage, Cumulus aggregates the data to improve the storage utilization. Pros: 1. The proposed user interface is succinct and has good portability. 2. Unlike previous papers which leverages on special storage architectures, this paper does show that an efficient network storage system could be built using generic storage architecture. Cons: 1. This paper failed to convince me that cleaning is a novel way to improve the storage overheads. There are many other ways to “compact” the disk. One could simply remove the limitation of the segment they defined, and do some local allocation heuristic for each time we backup the client’s data. 2. What is the right time granularity for network storage backup? Could we do even more fine-grained? It seems the proposed architecture might be promising to provide smaller backup intervals, but they did not discuss this issue. 3. Cumulus did not address the storage failures. A simple data replication scheme could help, to increase its reliability. ---- SPORC: Group Collaboration using Untrusted Cloud Resources, OSDI’10 ---- This paper proposed SPORC, a scheme to build cloud applications with untrusted server. To provide the integrity, all the data seen by the server are encrypted. However, server could temper the execution order. To avoid this, SPORC adopt the idea from the fork* consistency to append the hash on every data block. This could guarantee that clients will detect that a server is malicious by exchanging only 2 messages. SPORC also support dynamic admission control where users could modify the access control list, even when they are disconnected to servers. They build SPORC prototype with two applications: key-value store and browser-based collaborative text editor. The experimental evaluation shows that SPORC could provide very small response time for data writing, and the system throughput is acceptable. Pros: 1. For a system paper, their prototype is solid. The two applications they built both have a wide range of applicability. Therefore, it is very convincing that their system would work well in practice. 2. The evaluation results look very good. The request serving latency is small enough for practice use. Cons: 1. To detect the malicious server, they use a sophisticated algorithm with no-very-small state on clients. Although they described how to use checkpoint to reduce the client’s state, they did not fully study the overhead imposed on client. To get rid of storing considerable amount of states on the client side, one simple solution is to let client communicate with each other. In this case, server cannot hide or discard data, to prevent inter-client communication. 2. Their scheme only guarantees that server cannot corrupt the shared state. However, server could introduce other undesired noises to degrade the system performance. For example, server could simply block the communication between clients. Also, server could inject random delays to slow down the communication. It is hard to detect if a malicious server simply inject random delay. 3. Their scheme assumes the clients are all honest. However, as all the communications have to be redirected by server, server could launch replay attacks. For example, suppose A want to do collaborative writing with another users, B. Then server could block B, and arbitrarily create “virtual” clients C to pretend as B to communication with A. -- Chi-Yao Hong PhD Student, Computer Science University of Illinois at Urbana-Champaign http://hong78.projects.cs.illinois.edu/ From: anjalis2006 SpamElide on behalf of Anjali Sridhar [sridhar3 SpamElide] Sent: Thursday, March 03, 2011 12:10 PM To: Gupta, Indranil Subject: 525 review 03/03 Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009 Cumulus is designed to provide a limited storage service to clients. It aims to provide a simple service which enables flexibility and reduces cost for the clients. The design of Cumulus offers a limited interface of put, get, delete and list. These operations refer to entire files as opposed to arbitrary ranges of bytes in files. Cumulus implements a write once storage model. Hence once the put command is issued, the file is never modified in its place except to delete it. Cumulus groups data from smaller files into larger units called segments. The hierarchy of Cumulus starts at the lowest level with the Snapshot (metadata log and the file data), this is broken up into blocks called objects and aggregated as a part of a larger block called segment. This is the unit that is stored on the server. The snapshot contains a snapshot descriptor. This contains a pointer to the root object. If the traversal is started from the root object, all other segments required by the snapshot are found. The write once model ensures that if put fails an old backup cannot be corrupted. Usage of segments by aggregating smaller files results in lesser network overhead and metadata storage. It allows for compression to take advantage of interfile redundancy. Aggregation of smaller files also helps in encrypting the file that is being sent across the network. Cumulus also supports incremental backups in large files by storing the changed portion of the file again. An advantage of the thin client backup system Cumulus is that due to the very basic storage service provided, the clients can choose a number of storage providers to host their data minimizing the business risk. The write once model results in keeping snapshots of the file at multiple points leading to redundant data. This increases the storage required by the client and leads to increase in costs. Cumulus does not provide coordination between multiple backup clients. Partial restoration might be very inefficient since Cumulus downloads multiple segments from the server. It is also possible that clients require some additional service like backup data cleaning, encryption, de-duplication when using multiple backup clients etc. The flexibility and cost efficiency of Cumulus might outweigh these considerations. Towards robust distributed systems, Eric A. Brewer, Keynote, ACM PODC 2000 Eric Brewer talks about his work in Inktomi. They work on global search engines and distributed web caches. He talks about how the classical definitions of distributed principles cannot be implemented in a practical setting. The three topic addressed are maintenance of state, consistency Vs availability and finally the boundaries separating client/server in remote procedure calls. We are able to see the implementation of classical distributed system principles or lack of in a real world setting. In the CAP theorem we see that we can only maintain two of the three properties – consistency, availability and tolerance to network partitions. The classical distributed systems focused on a computation intensive workload while the current systems are looking at data intensive workloads. Classic Distributed system principles also did not take into account the location specific information. There is a definite tradeoff involved between availability and consistency. Hence the presence of one does not exclude the other. Since the speaker mentions that all systems are probabilistic, in real systems it is usually a question of what the system should be optimized more for. From: Simon Krueger [skruege2 SpamElide] Sent: Thursday, March 03, 2011 12:05 PM To: Gupta, Indranil Subject: 525 review 03/03 Cumulus: Filesystem Backup to the Cloud, Michael Vrable, Stefan Savage, and Geoffrey M. Voelker The core idea of this paper is to present a backup program that runs on top of abstract cloud storage services like Amazon EC2. They want to compare the overhead and price of this strategy to more specific services. To do this they implemented, Cumulus, which has a simple put/get/delete/list API on whole files. They design the back up to store segments of files which allows them to easily store incremental backups of the data. They were able to show that the storage costs Cumulus occurred were lower than both general backup solutions and specific backup services. Their design does not allow multiple client backups. I think this paper was a good investigation of how common cloud services today can be used. In particular, it was interesting to see that specific backup services (i.e. they ONLY provide backups) are more costly than using their backup service that can run on top of any arbitrary cloud data service. SPORC: Group Collaboration using Untrusted Cloud Resources, Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten The core idea behind this paper is provide a collaborative text editing program and key value service that runs on top of an untrusted server. The key idea behind their implementation of SPORC is to combined Operational Transformation (OT) which allows concurrency without locking and Fork* Consistency which can prevent the state of two clients from forking. Basically by combining OT and fork* they can easily have efficient realtime collaboration with OT and notice if the untrusted server has tampered with the service with fork*. The server could split the clients and fork* is unable to detect this. However, if the server ever tries to rejoin the clients, fork* will be able to detect the tampering. I think this is novel application of the two algorithms that could be beneficial in public cloud environments since services could be more easily tampered with (since they are exposed to the public) but I am not certain if this is needed in private cloud environments where services are mostly secured by other mechanisms (firewalls, passwords, internal networks, etc...). Simon From: mdford2 SpamElide Sent: Thursday, March 03, 2011 12:05 PM To: Gupta, Indranil Subject: 525 review 03/03 Cumulus: Filesystem Backup to the Cloud Cumulus is a storage backup system that attaches to any cloud service with only basic storage services. The system combines smaller files in order to reduce the overhead associated with multiple files. This reduces overhead in transmission, storage, and the costs associated with both in cloud services. Their system outperforms several other systems with regard to operating cost. Cumulus can also efficiently store small changes to large files by using the same techniques they use to clump small files. They use a write-once storage model, and scrub the backup when old snapshots are no longer needed. The paper's main contribution is the concept of aggregating small files to reduce transaction overhead. The remainder of their system is competitive, but not novel. The paper does not address multiple source storage. SPORC: Group Collaboration using Untrusted Cloud Resources SPORC is a client-side system for supporting consistency by sending encrypted data to an untrusted server. The system uses fork* consistency on the clients to enable detection of deviating servers and recovery by switching servers. Clients encrypt their data updates, send them to the server, and verify incoming updates both via public/private key matching, and a hash chain over the committed history. In this scheme, clients are left with the majority of the work. Servers do store data, and enable access when other clients are all offline, but there was no comparison to a peer-to-peer based group coordination approach. If the server is regulated to a file store, there may be a way use it in the same way from a lower latency peer-to-peer system. From: Anupam Das [anupam009 SpamElide] Sent: Thursday, March 03, 2011 12:05 PM To: Gupta, Indranil Subject: 525 review 03/03 I. SPORC: Group Collaboration using Untrusted Cloud Resources SPORC paper addresses the security and privacy issue of cloud based collaborative services. Cloud based collaborative services are popular due to their global accessibility, high availability and fault tolerance, but they lack security and privacy guarantees. If the servers are compromised they can access all the personal data available in them. To address this issue the paper moves application logic from servers to clients and uses servers for just storage and message ordering. Each client also maintains a local copy of the shared data and optimistically performs updates with the hope of having eventual consistency.  SPORC uses operational transformation (OT) and fork* consistency (hashing on historical actions) for this purpose. Since clients encrypt their data when sending it to the server and since servers perform no other computation on the data, it cannot access the real data. So, the basic philosophy of this paper is that it pushes responsibility towards the clients while making server functionality simpler (more like the thick client and thin server architecture.) However, there are some issues that have not been considered in the paper- 1. The threat model assumes that only servers are malicious. But clients too can be malicious and they can launch more complex attacks like collusion. 2. Since SPORC relies on eventual consistency there is always the possibility that the client may crash in which case it would lose its most recent non-committed (pending) operations. So, availability can be hampered. 3. In the experimental part of evaluating latency the number of clients considered is relatively small. It would have been interesting if they scaled the number of clients to a higher degree. 4. The checking of forks through out-of –band communication is totally dependent on clients. So, when does a client check for such forking? Should clients do this in a periodic fashion or just on the fly? Is there any indication as to when forking exists? 5. The extensions proposed like- recovering from fork have not been evaluated in the paper. It would have been nice to see how long it takes to identify a malicious server. ------------------------------------------------------------------------------------------------------------------------------------------- II. Cumulus: Filesystem Backup to the Cloud Cumulus talks about a backup system built on top of a thin cloud infrastructure. As cumulus is built on top of a thin cloud it provides only the very basic operations like- get, put, list and delete. As update/modify operation is not provided, any update is done through a delete operation followed by a put operation. Thus there is some cost associated with cleaning. Cumulus uses incremental update to lower this cost by taking snapshots at different point of time and then using them to determine the difference in the updates. For storage efficiency cumulus uses LFS (Log-structured File System) segment cleaning. It also uses aggregation to exploit inter-file redundancy. All-in-all the authors provide a nice system for backing up files for personal usage. Pros: 1. The system in simple and easy to use. It is not dependent on any vendor specific service and thus allows easy migration. 2. One of the strong points of this paper is that the system is fully implemented and is available for usage. They also provide experimental result based on their real deployment. 3. The authors have tuned various thresholds to obtain an optimal output. 4. More cost effective than other system (like Jungle Disk, Brackup). 5. Achieves low storage overhead through aggregation and incremental update. Cons: 1. Cumulus is more suited to WORM (write once and read more) systems as it does not support a direct update operation. Overhead associated with storing snapshots will also increase if updates are very frequent. 2. There are a lot of metadata that need to be maintained. This could become a potential bottleneck. 3. It does not support simultaneous update from multiple nodes. It is basically used for single user update. (unlike dropbox which allows multiple parties to update at the same time) 4. Cumulus does not support file versioning. It would have been nice if they maintained different versions of a file so that a user could roll back to previous versions it required. ------------Anupam From: kevin larson [kevinlarson1 SpamElide] Sent: Thursday, March 03, 2011 12:01 PM To: Gupta, Indranil Subject: 525 review 03/03 Cumulus allows users to back up their local storage media to a “thin cloud”, a cloud which provides only basic storage interfaces. Cumulus only needs put, get, delete, and list in order to operate. The current application space lacked a network storage back-up solution which incorporated multiple snapshots, simple (thin) servers, continuous increments, aggregated storage, and encryption, all of which are important to the performance, cost, and security of a backup system. In evaluation, it was shown that Cumulus was only marginally worse than an optimal (optimized for the task, without many of the restrictions imposed on Cumulus), in the range of 2-5% of the storage and upload sizes (and therefore costs). It was interesting to see how a system running on such a simple interface managed to perform so comparably to an optimized system. In general, the evaluation was very solid, not only comparing Cumulus to either the optimized solution, or the common competitors (something that seems too common in many evaluations), but both. As admitted in the conclusion, the specific problems addressed by Cumulus are not terribly interesting. However, the general concept of the use of clouds for conventional tasks, as well as optimizations based on the specific systems and clouds is probably going to continue to see increased attention. Brewer presents a strong case for availability in his presentation. Inktomi is a company which focuses on availability, and managed to keep their systems operation through a variety of problems, ranging from the disk failures, to power/network outages, to physical location changes. Most classic distributed systems make trade-offs in order to maximize consistency. This comes at either the expense of availability or tolerance to network partitions. Basically Available Soft-state Eventual consistency (BASE) is proposed as an alternative metric to Atomicity, Consistency, Isolation, Durability (ACID), compromising consistency and isolation in order to maximize availability. Harvest, the fraction of the complete result, and yield, the fraction of answered questions are another interesting metric for distributed systems. Conventionally, 100% harvest was chosen at the expense of yield. However, newer systems, including the Internet prioritize 100% yield at the expense of harvest. It was interesting to see a fundamentally different set of priorities and how they affected operation of a company. High availability systems are an interesting shift from the convention of consistency. The uses for highly available systems were well presented and discussed. There are still a variety of areas to research in order to further improve highly available systems, but little attention was given to this. It would have been interesting to see more about the prospective research opportunities. From: Shen LI [geminialex007 SpamElide] Sent: Thursday, March 03, 2011 11:49 AM To: Gupta, Indranil Subject: 525 review 03/03 Name: Shen Li SPORC: Group Collaboration using Untrusted Cloud Resources This paper proposes an idea of building a wide variety of collaborative applications with untrusted servers. The two basic contributions of SPORC are: 1) prevent malicious servers to reveal the contend of communications to unauthorized entity, 2) present malicious servers to reorder the real sequence of events. To fulfill the first requirement, every piece of data sent to one client is encrypted by its own public key. So the server cannot understand the data it is processing. But the server still has a chance to modify the order of all event. To prevent this, SPORC employ two sequence numbers: client sequence number (clntSeqNo) and global sequence number (seqNo). On receiving an operation, a client verifies that clntSeqNo is greater than the previous clntSeqNo submitted by the same client, and apply the same check to seqNo. When the order is violated, the client will know that that server is doing something bad. So they can change to other servers. Pro: 1. As they discussed in paper, it is better not to remove the server role from this kind of application. Otherwise, their will be no arbiter who can give a deterministic feedback to users actions. Under this circumstance, this paper guaranteed the security of applications even if the server itself is not trusted. Con: 1. It is a little bit wired that SPORC propose a concurrent developing model while it cannot guarantee consistency. The operational transformation cannot handle all updates properly. In some cases it still calls for manually negotiation. 2. The meta-operations are sent to server in the clear. So, it can be a potential threat that server can reveal this data to some interested entities. 3. If one client want to make some updates to the file, it should encrypt the operation via all other users public key. It can be costly. Cumulus: Filesystem Backup to the Cloud This paper propose a system with minimal interfaces (put, get, delete, and list) to provide Cloud file backup functionality. The authors point out that current cloud-based backup services implement integrated solution that calls for specific software installed on both client side and server side. This kind of approach is able to achieve greater storage and better bandwidth efficiency. However, it also impede the portability of cloud backup services, which can be one main concern in future Internet based applications. Under this motivation, they propose the thin-cloud conception in which cloud side only expose minimal operation options. So that it can be easily ported any online storage services. Pros: 1. This idea is somewhat similar to design the IP layer to connect both higher layer protocols and lower layer implementations. Their thin cloud idea can be easily supported by lower level service providers as well as higher level applications. Cons: 1. This architecture is not versatile enough to support all application. At least it cannot do that according to the APIs they expose currently. For example, in some applications, users need to continuously update a very large file. And for safety reasons, they want to the file stored in cloud side always be up-to-date. It is be very costly to delete and re-upload the whole file frequently, since they do not support ability to read and write arbitrary byte ranges within a file. 2. The segment cleaning policies should involve human efforts since the server alone is not sure about whether a file is OK to be deleted. From: trowerm SpamElide on behalf of Matt Trower [mtrower2 SpamElide] Sent: Thursday, March 03, 2011 11:46 AM To: indy SpamElide Subject: 525 review 03/03 Cumulus Cumulus is a cloud based storage system for filesystem backups. The paper describes cloud services as either thin or thick based on the number of services they offer. The authors combined features of several previous implementations and deployed the service on top of Amazon's S3 storage service. Cumulus identifies five features which it possesses but no previous work completely possesses. Most modern backup systems have the ability to take multiple snapshots (except rsync). About half of the surveyed works support simple server interfaces and encryption of data. The remaining two features are common although never found with the other features. Cumulus offers the ability to take incremental snapshots forever after an initial snapshot and the ability to store sub-file deltas. These combined features make cumulus an efficient, safe, and reliable backup system. The authors took an interesting twist to previous work in that they optimized the backup service for cost on modern storage services such as Amazon's S3. This is certainly of concern to most end-users and something that has been overlooked in the past. SPORC SPORC is a secure framework for building cloud-based applications. Most cloud applications require the end user to trust the server. SPORC offers operational transformations and fork* consistency to allow it to be resilient to malicious server operations. In order to ensure data is secure the framework uses cryptographic keys on the clients similar to many existing protocols. This alone does not guarantee your data is safe though. The server could rearrange PUT requests to make the data reach an inconsistent state. The first feature operational transformations allows for operations to be reordered in a commutative manner while maintaining causal consistency. The second property fork* consistency is used to detect when two copies of the data deviate. If this deviation happens, then the clients must either not see changes from the other client or realize that the server is malicious. Based on these ideas the authors created a KV store and a simple collaborative text editor. The one clear disadvantage to this model is that the client now carries the burden of the processing load rather than the cloud. This runs against the notion of dumb devices connected to a smart cloud that many envision. From: muntasir.raihan SpamElide on behalf of muntasir raihan rahman [mrahman2 SpamElide] Sent: Thursday, March 03, 2011 11:33 AM To: Gupta, Indranil Subject: 525 review 03/03 Towards Robust Distributed Systems Summary: The talk discusses three classical problems in distributed systems: state, consistency vs availability, and understanding boundaries. It proposes to use BASE (Basically Available Soft-state Eventual Consistency) instead of ACID by forfeiting C and I for Availability. The CAP theorem says that for a distributed web-service, you can only guarantee at-most two of Consistency, Availability, and Partition tolerance. The author also points out that the boundary of a system could be ill-defined due to false transparency guarantees. Overall, classical distributed systems are fragile due to exclusive focus on computation without thinking about data, ignoring location distinctions, loosely defined consistency/availability trade-offs, and poor understanding of boundaries. The author proposes the DQ principle which states that Data/query multiplied by Queries/sec is a constant (or, capacity * completeness = constant). A good take-home message of the talk is that all systems are probabilistic in nature. As a result, to build better systems, we should think probabilistically, maximize symmetry, try to make faults independent, and use randomness in system design. We need to stop dreaming about 100% working systems and be prepared to accept partial results. Pros: (1) First look at the CAP theorem, postulates fundamental trade-offs for distributed systems. As a result, researchers won’t run after the holy grail of 100% available systems. (2) Encourages probabilistic thinking of systems. Cons: (1) Is BASE good enough for an internet service that deals with sensitive data, e.g. a HPC cluster that deals with financial transactions, or high frequency trading? Future Work: (1) Does the CAP theorem also hold for cloud data centers? SPORC: Group Collaboration using Unstructured Cloud Resources Summary: Cloud deployment is attractive for distributed systems that collaborate on shared state, e.g. google docs, google calender and so on, mainly due to scalability, availability, and the possibility of real-time collaboration. However the trade-off is that clients must trust the third party cloud with potentially sensitive data. SPORC overcomes this by generalizing to the case where cloud servers are untrusted. It assumes that servers have very limited functionality. In this case they are only required to store encrypted data and order client updates. As a result the clients are forced to store local state. So the first major problem that arises is keeping local copies consistent even with off-line access. This problem is solved with Operational Transformations (OT) which can synchronize arbitrarily divergent clients. The second issue is dealing with a malicious server. This is addressed using fork* consistency which ensures serializability for honest servers, and tamper detection within 2 message exchanges in case of a malicious server. Clients maintain fork* consistency by computing a hash chain over its view of committed operations. and sending it with any new updates. Another client can check the hash chain to detect a malicious server. The authors evaluated SPORC by building a key-value store and a browser based collaborative text editor with minimal code requirements. Pros: (1) SPORC allows users to work off-line and still maintain synchronized states. (2) Can deal with untrusted servers even when the server isolates clients into distinct partitions and equivocates. Cons: (1) The overhead of dealing with untrusted servers might be too much. Is it possible to ignore the possibility of malicious servers and still have good performance? Future Work: (1) The paper argues that OT can ensure commutative operations. Does it also work for associative operations? (2) SPORC servers are lightweight, in that case is it possible to eliminate servers and work with a P2P system? The peers could use reputation mechanisms to deal with malicious peers and eliminate the overhead of fork* consistency. -- Muntasir From: nicholas.nj.jordan SpamElide on behalf of Nicholas Jordan [njordan3 SpamElide] Sent: Thursday, March 03, 2011 11:12 AM To: Gupta, Indranil Subject: 525 review 03/03 Nicholas Jordan njordan3 03/03 Cumulus: Filesystem Backup to the Cloud This Cumulus is a way to organize you data in such a way that it allows users to group the data to store backups on any cloud provider. This scheme allows people not to be locked into a certain provider, since your not relying on the provider to the maintenance the meta data of your backups. All cloud providers allow you to put/gut/list all files/delete. The scheme is to store the data and the meta data separately. The cluster of flies/data is contained in a segment. And there is metadata called snapshot descriptions, which reference which parts of each segment reference data in other segments. Pros: - data is portable - incremental backups, don’t incur a lot of overhead - can encrypt the data Cons: - puts a lot of the management of the packaging of data into the hands of the client, they have to all the packing - its not really a service, more like a best practice Additional thoughts: I like the fact that incremental changes, don’t make you have to create entirely new snapshots. You can just reference previous data from previous snapshots. The separation of the data and metadata allows this. Towards Robust Distributed Systems : CAP Thereom I looked at the slides speech, however I couldn’t get much content from the slides. This speech is very famous and brought about the CAP theorem for distributed systems. CAP Theorem In a distributed system, there are three desired qualities (C)onsitstency, (A)vailablity and tolerance to (P)artitions. A distributed system can only have 2 of the 3. To give an example, if you system is CA, then you can’t handle network partitions because you system can’t be consistent since the whole system can’t agree on a change. If you system is AP like DNS, when a new webpage is entered not all servers see the change rite away. Even during a network partition, the system still works and is available. For the Final CP, if you are going to be totally consistent and handle network partitions you give up availability. When a network partitions happens, the servers can’t come to a total system consensus, so there’s no availability. Counter Example By Daniel Abadi – Yale Professor http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html This blog post is a critique on the CAP theorem. 1)If a system is CP, it just means that availability is not a top priority, but in the presence of a network partition there is still some degree of availability. 2)CA ~ CP. “As noted above, CP systems give up availability only when there is a network partition. CA systems are “not tolerant of network partitions ”. But what if there is a network partition? What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. But there is still some degree of availability. There are really CA/CP and AP systems. The fundamental difference is when there is a network partition, which do you value more consistency or availability? The first two points are critiques about the theorem. But, Abadi gives an example of a system that doesn’t fit at all into the CAP theorem. Known to Yahoo! As PNUTs or Sherpa in the academic world gives up consistency not for improving availability, but to improve the latency parameter (or minimize latency). Sending a message over a WAN increases the latency of a transition on order of hundreds of milliseconds. By avoiding this latency hit in performance, they have to go to a concept of “timeline consistency”. There’s no real bound in the number of messages when a system will be consistent. Whenever machines receive updates, they will receive the same order of updates. My final Thoughts: Both are sources are valuable. Brewer papers give you a good foundation for understanding the barrier in a distributed system. Abdadi gives scenarios where systems don’t quite fit the rigid CAP system. I feel that his PNUTS example is still covered under Brewer’s CAP theorem. Lowering latency is still closely related to availability, it might be Qos of availability instead of like percentage of machines availability. -- Thanks, Nick Jordan From: Jason Croft [croft1 SpamElide] Sent: Thursday, March 03, 2011 11:00 AM To: Gupta, Indranil Subject: 525 review 03/03 SPORC: Group Collaboration using Untrusted Cloud Resources SPORC is a framework that provides collaborative editing of documents, automated merging, and off-line updates, stored on an untrusted cloud server. To do so, it leverages operational transformation (OT) and fork* consistency, which allow for lock-free concurrent operations that converge to a consistent state and ensures the untrusted server cannot modify the data without detection. Specifically, OT allows each client to edit the document and any updates are mapped to a set of operations that can be performed on it. Since clients' updates may result in their respective documents diverging, a transformation function can be used to return both documents to a consistent state. All operations are signed with a client's private key to prevent the server from attempting to modify any operations. With fork* consistency, clients can detect if a server misbehaves since clients share their individual views of the history of the document by including it with all operations. This prevents a server from notifying only some clients about operations and not others without being detected. With this design, clients can perform updates to a document while offline, and later merge changes when online. Another strong point of SPORC's design was support for version control where conflicts must be resolved manually. This was a small addition, but covers an important potential usage of SPORC. Companies wanting to store source code in the cloud or online (e.g., GitHub) could do so safely, without worrying about malicious tenants or cloud providers trying to steal their code. I also liked the brief section explaining the benefits of having an untrusted server, justifying why a design like SPORC is needed. However, the authors assume that when a server begins misbehaving, there will be a properly behaving server to switch to or use. Doesn't this imply there is some trust in at least on of the servers not to misbehave? Though it may be outside of the scope of their work, SPORC should probably support the case where there is no server that will behave. Maybe there could be some way in which clients could still edit the document with the server misbehaving? In addition, meta-operations are sent to the server in the clear, which can include changes to the access control list. Couldn't this allow a malicious server to edit the metadata to add itself as an editor? In their implementation, the authors also note performance issues with JavaScript and encryption. This may be an appropriate fit for Google's NativeClient, if an implementation of their web-based client in native code was possible. Towards Robust Distributed Systems This talk by Eric Brewer explains three issues why distributed systems do not work: state, consistency vs. availability, and boundaries. With state, persistence is difficult (more difficult than computations) and data centers exist so we can have consistency and availability. In the trade-off between consistency and availability, Brewer notes that there is a fundamental trade-off between consistency and isolation and we trade these two for availability, graceful degradation, and performance. He also describes the CAP theorem, in which one can have at most two of three properties: consistency, availability, and tolerance to network partitions. For example, single site data centers forfeit tolerance to partitions, distributed databases forfeit availability, and DNS forfeits consistency. For boundaries, Brewer describes problems in trust between the two sides, partial failures, and multiplexing clients. Kernels, for example, solve this issue of trust by using opaque references and copying arguments, but many systems like TCP and browser do not properly solve this problem. In addition, Brewer argues that due to sloppy boundaries, systems become fragile, and the root cause of many of the problems is false transparency. He expands on his ideas, and further claims that classic distributed systems are fragile because of lack of focus on data, ignoring location distinctions, and poor definitions of goals for consistency and availability. Even though his talk is over a decade old, I think it is just as relevant today as it was in 2000 with so much focus now on cloud computing. With collaborative and online editing becoming popular (e.g., SPORC, Google Docs, Microsoft Office Live), there is becoming an important need and focus on data, consistency, and availability (and not as much on computation). From: iitb.ankit SpamElide on behalf of Ankit Singla [singla2 SpamElide] Sent: Thursday, March 03, 2011 10:50 AM To: Gupta, Indranil Subject: 525 review 03/03 1. Cumulus ----------------- Summary: Cumulus is a system designed and optimized to take advantage of cheap cloud storage for data backup. This is a cheap way to get off-site data storage, thus guarding against catastrophic failure at one site. Simplicity of design seems to be key element: a) Cumulus requires only four simple operations from the cloud server - put/get/list/delete. It doesn't use any 'modify' functions or read/write functions on parts of a file. Their write-once model has the advantage of reducing the possibility of corrupting old backups while attempting to do a new backup run. The key idea is to group files into segments and maintain appropriate metadata so that data that is duplicated across snapshots can be reused instead of creating multiple copies. There are some other interesting ideas in Cumulus to reduce costs; for instance, their use of segment cleaning (like defragmentation) doesn't require downloading the segment and putting back an updated copy; instead, they just write new segments with only useful data to the remote storage, garbage collecting the old expired segments later. Comments: The idea of building a tool like Cumulus on top of a thin API brings about the trade-off of provider lock-in versus efficiency. Exploiting a more refined API might enable more efficient function, but limits portability. For a backup utility, it seems the latter concern is more important and Cumulus' choice seems justified in that regard. 2. SPORC ----------------- Summary: SPORC is targeted at making possible real-time collaboration between multiple clients while making sure that a) client-data is unreadable by the server b) the server can't tamper with the data without risking detection c) clients can fix such tampering after detection. To fulfill these goals, SPORC addresses the problems of keeping client copies of data consistent and handling malicious servers. The key idea of SPORC is to combine two paradigms: a) operational transformation, which mutates operations in such a way that their independent application by different clients reaches a consistent state; b) fork*-consistency, which ensures that clients which see each other's updates after a malicious 'forking' of their views by the server will be able to detect such activity. SPORC essentially applies OT to recover after detection of said forks. Comments: Given the concerns about confidentiality and integrity of cloud-hosted applications, this seems a very timely effort. It would also be interesting to see what happens when a few of the clients collude with the malicious server against other clients. But I think this concern might be largely separable as a Byzantine consensus kind of problem. Also, how does one really distinguish between natural/genuine multiple client failures and a malicious server forking a set of clients from the rest? I'm unfamiliar with much work in this direction, but this seems like a pretty nice paper. Ankit -------------------------------------------------------------------- Ankit Singla Graduate Student, Computer Science University of Illinois at Urbana-Champaign (UIUC) http://www.cs.illinois.edu/homes/singla2/ From: Agarwal, Rachit [rachit.ee SpamElide] on behalf of Rachit Agarwal [agarwa16 SpamElide] Sent: Thursday, March 03, 2011 10:46 AM To: Gupta, Indranil Subject: 525 review 03/03 ----- Cumulus: Filesystem Backup to the Cloud The paper presents a file system backup solution. The most interesting feature of Cumulus is its simplicity and "flexibility". It allows the users to migrate from one storage provider to another without significant overhead. The interface is simple, which in turn provides flexibility in terms of implementing a more optimized storage service on top of Cumulus. 1. Cumulus is restricted to write-once storage model. It may not be efficient for applications that require frequent modifications and back-up of the same files. Incremental back-up is mostly offloaded to the client end, which seems like a tricky design decision. 2. Cumulus made the design decision of storing many small files as a single segment. It seems to me that reading a particular small file will require reading the whole segment, which can be pretty inefficient. In particular, how does one extract a particular file efficiently? If compression is used, does one need to decompress the whole segment to extract a single file? 3. Segment cleaning seems to bring the real overhead for Cumulus. It writes a new copy of the data every time a snapshot is updated. This could be very space-inefficient (but definitely quick and dirty fix to the problems with LFS). In general, leaving segment cleaning to the client-side can lead to inefficiencies at the server end. ----- Towards Robust Distributed Systems The talk outlines various fundamental issues in distributed systems. The main idea is to highlight the poorly defined concepts like consistency and availability (in terms of achievable performance). In general, the insights provided on building robust distributed (especially, from the perspective of an industrial implementation) are really interesting. Finally, it is interesting that the author puts forward the idea of focusing on data rather than computation, a paradigm which seems to have gained more focus recently. In general, CAP theorem (law?) is very thought provoking. In today's systems, it may be possible to avoid giving up on either of consistency, availability and partitioning of the system by giving up on latency. I wonder if latency falls into "availability"? If no, why did the author not consider talking about latency? If yes, does availability means zero-tolerance to latency? ----- Best regards, Rachit -- Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Web: http://agarwa16.wikidot.com ---- Sing as if No one is Listening, Dance as if No one is Watching, Dream as if You'll Live Forever, Live as if You'll Die Today !! From: yanenli SpamElide on behalf of Yanen Li [yanenli2 SpamElide] Sent: Thursday, March 03, 2011 10:37 AM To: Gupta, Indranil Subject: 525 review 03/03 paper 1. Towards Robust Distributed Systems Summary: This talk summarizes several fundamental limitations of previous distributed systems, and proposes several novel ideas for future DS research. It's surprised that many of the ideas in this talk are actually implemented successfully in recent systems. The fundamental limitations of the classic DS include focus on computation but not data, poor definitions of ACID vs BASE goals, ignoring location distinctions, and poor understanding of boundaries. One of the most important tradeoff discussed in this talk is the Consistency vs Availability. Database research is focussed on higher consistency guarantees (ACID) but distributed system community focus on Availability. This trade-off leads to the Consistency-Availability-Network Partitions (CAP) Theorem, which states that an application can only choose any two out of the above three characteristics. The CAP Theorem doesn't state that you can only use two configurations on the CAP space. In fact, one can design systems in the whole CAP space with a clear ACID/BASE goal in mind. However, the research in the area at the time is largely unexplored; one reason is that the database community and DS community is separate. Pros - Summarizes many interesting limitations in the distributed systems field. - Highlights some important ideas, such as the consistency vs availability trade-off. - Emphasizes the notion of thinking probabilistically to account for random failures Cons - The author claims that the message-passing will be a winning strategy; however, we have only observed limited success of this strategy in this decade. paper 2: Cumulus: Filesystem Backup to the Cloud summary: This paper presents Cumulus, a cloud based filesystem backup service. The primary motivation for designing Cumulus is that current backup systems have specific formats for data storage and protocols of performing the backups. The lack of generic formats requires a heavy cloud layer for the user. Cumulus has simple storage server interface with thin server layer to provide portability. The storage server interface has four operations: get, put, list, delete. Cumulus backs up the file system via snapshots of the storage system at various time points. Both metadata and files are broken into objects and stored on server. Cumulus is optimized for a complete restore. When a system failure happens, the stored files can be rebuilt using the snapshots. Cumulus is built on the Amazons S3 ( Simple Storage Service). Pros - A cost effective solution for backup. - building a generic service on top of a thin cloud computing resources layer. This design decouples service from infrastructure, having more flexibility for choosing infrastructure providers. cons - only optimizing for a complete restore. Partial restores might be inefficient. - no inbuilt compression or encryption module in the system, heavy reliance on external tools, which might not be efficient and effective. -- Yanen Li Ph.D Candidate Department of Computer Science University of Illinois at Urbana-Champaign Tel: (217)-766-0686 Email: yanenli2 SpamElide From: Andrew Harris [harris78 SpamElide] Sent: Thursday, March 03, 2011 10:27 AM To: Gupta, Indranil Subject: 525 review 03/03 Review of “Cumulus: Filesystem Backup to the Cloud”, Vrable et al, and “SPORC: Group Collaboration using Untrusted Cloud Resources”, Feldman et al Cumulus is a system for so called “thin cloud” backups; that is, backups with only a minimal set of operations exposed to clients (get, put, delete, list) that are not reliant on vendor-supplied client software to operate. Underneath this client layer, a number of technologies common to thick-client backup services are also implemented: delta snapshots, sub-file deltas, encryption, multiple snapshot points, automated garbage collection for deleted snapshots. They build a prototype system on top of Amazon’s S3, and demonstrate a relatively low cost of operation based on S3’s pricing model, due to the low backup overhead required for their system. They also show that their system is technically superior to a handful of existing backup systems, in that their system requires overall lower storage, lower upload bandwidth use, and a lower total number of operations. Though decried by some in the computing community for reasons of privacy, data control, and data freedom, cloud backups seem a natural use of distributed computing systems. Large companies have been doing this for years, with nightly backups being sent between locations to ensure data fastness in the event of natural disasters or otherwise. With storage prices continuing to fall, and with services such as S3 exposed to the public, it is a perfectly reasonable progression to see consumer backup taking place in massive storage systems. A curious conundrum exists, however, in encrypting data onto massive distributed storage shares. While it is the case that data stored on such machines is technically secure from even a very large crypto brute forcing, it is also the case that people are still terrible at choosing strong passwords. Presumably, a backup service would like to offer users a password recovery mechanism, in the event of a lost decryption key - that way, their old backups would still be accessible. The host of such a distributed storage system would also have access to relatively large computing power (as would be the case with Amazon’s EC2 here). As such, the user may as well store their information as cleartext, as it becomes relatively trivial for this hosting provider to break a user’s encryption using their recovery password. This is a limitation for any backup provider that I have yet to see solved; so far, the only answers seem to be “use a password and risk being broken” or “don’t use a password and risk losing backups”. I would like to see more work that attempts to rectify this issue. SPORC is a system designed with security as a primary concern, rather than what seems to be a secondary concern in Cumulus. SPORC allows a client to collaborate with other clients on arbitrary computation tasks, the privacy and security of which are guaranteed to a degree by the use of crypto keys. It provides a shared file space for users in collaboration, it has mechanisms by which to detect and recover from tampering or other malicious behavior, and it tolerates version conflicts and unstable network connections. Rather than casually assume some degree of trust with the remote system and links between it, the team here takes a typically crypto approach and assumes that everything is malicious apart from the users themselves. By this, they apply crypto schemes to all data moving in and out of the shared space, as well as to the metadata of what is currently stored in the shared space. This of course implies a greater crypto overhead to use of this system, which the implementers leave to the client side (because the server cannot be trusted). Their micro-benchmarks show a promising trend of resource overhead stability for small numbers of users, with only a text editing task showing relatively higher overhead. This approach is suitable for live collaboration, as locally stored copies on each user’s machine would suffice in the event of a network outage; the system is designed specifically to handle this, even. Where this system has issues, however, is in scalability: propagating changes to a small number of users (their experiments used 16 clients) is simple and incurs a low overhead relative to file size. By contrast, propagating those same changes to, say, tens of thousands of users becomes much more costly, as each user must expend resources to decrypt what comes out of the shared server space. A use case with such a number of users is not clear at this moment - perhaps a corporate system that outsources its digital collaboration spaces across multiple physical offices. However if this scheme were to be transplanted to a larger user base, the suitability of this system becomes unclear.From: w SpamElide on behalf of Will Dietz [wdietz2 SpamElide] Sent: Thursday, March 03, 2011 4:15 AM To: Gupta, Indranil Subject: CS525 Review 03/03 Will Dietz cs525 3-3-2011 "Cumulus: Filesystem Backup to the Cloud" This paper presents 'Cumulus' which is both an idea and an implementation of using a 'thin-cloud' for filesystem backup. They built a system that uses a very minimal storage API (put, get, delete, list) to implement a filesystem backup without requiring much server-side help. The key benefit to this 'thin-cloud' approach (even if using a provider that supports other features) is that it doesn't tie you to particular provider, which when combined with the cost analysis the authors did makes a compelling case for an open market of competing providers. Paying someone else to manage the storage (which is managed for you, comparably cheap, and located remotely, all good things for backups) makes sense when using their model, but also the API they choose ensures that what you pay for (dumb storage, mostly) is what gives you the benefit, not the logic involved in the backup itself. Of course a custom-built system end-to-end has benefits (as they note in the paper), but in terms of what normal users can make use of in today's offerings, Cumulus seems rather compelling. They use this simple abstraction to back up files, including features such as combining small files, and incremental updates. It also supports encryption and compression through use of external tools such as gzip and gpg. My favorite part of the paper was the question they pose at the end: "Can one build a competitive product economy around a cloud of abstract commodity resources, or do underlying technical reasons ultimately favor an integrated service-oriented infrastructure?" Pros: + Thin-cloud reduces lock-in + Cheap storage for many use-cases + Simple, straight-forward design Cons: - Custom-built solution might fare better - Long-term data retention seems to scale poorly ----------------------------------- "Towards Robust Distributed Systems" This set of slides/presentation takes a more applied perspective on distributed systems than we generally find in the literature. It starts out by observing that many key distributed systems that we all rely on are hardly 'classic' distributed systems, and launches from this to explore what makes a 'working' (read: robust) DS. The most interesting parts were the discussion on consistency/availability/partitions (pick 2), and the discussion of 'boundaries'. The C/A/P description drives-home the trade-offs one makes when building a system. The boundary discussion highlight what works and what doesn't--that while APIs seem useful for programmers (given their procedure-like semantics), it's really the protocols that are more effective (as an example). Regardless the big take away from the boundary section is the observation that we often define boundaries poorly and result in 'fragile' systems--trying to hard to provide transparency, for example, causes potentially more trouble than it's worth. ~Will From: david.m.lundgren SpamElide on behalf of David Lundgren [lundgre4 SpamElide] Sent: Thursday, March 03, 2011 2:49 AM To: Gupta, Indranil Subject: 525 review 03/03 SPORC: Group Collaboration using Untrusted Cloud Resources Feldman et al. present their cloud framework, SPORC, for collaborative applications and cloud-based services using untrusted servers. SPORC's guiding principle is the relation between operational transformation and fork* consistency. Operational transformation defines both operators and their respective transforms. Local data can be modified by a series of operators and then morphed to a consistent state by the repeated application of transformed remote operations. The untrusted server is then only delegated the task of properly ordering such operations. To prevent a malicious server reordering data, the authors stipulate fork* consistency such that a client's view of the operation history is encoded within each operation sent to the server. This allows clients to detect a malicious server if ever sent information about another client. The remainder of the paper details SPORC's threat model, system design, membership management, conflict resolution, and fork recovery algorithms. Pros: 1. The paper presents an inspired amalgam of public-key crypto and distributed systems concurrency techniques to address their threat model. Cons: 1. The authors did not evaluate the time taken to recover from a fork. The burden of detecting malicious server actions is shifted to out-of-band client communication. This assumes that clients are indeed able to communicate securely. It would be interesting to attempt to shift this burden back onto the server by incorporating multiple untrusted servers into the system. ---------------------------------------------------------------------- Cumulus: Filesystem Backup to the Cloud Cumulus is a "thin cloud" system for data backup to the cloud. The authors pose a minimalist RESTful-like interface with the atomic operations put, get, delete, and list. Filesystem backup logic is entirely shifted to the clients' machines. Cumulus is a write-once storage model where system snapshots are composed of segments (which are themselves the aggregate of groups of smaller files). This is done to reduce bandwidth costs, enhance the security afforded by encryption, and to allow greater file compression. System snapshots are made up of a metadata headers and the data payload (both of which are bundled into the aforementioned segments). Snapshots are stored with pointers to their respective segments. To reduce excessive data usage, novel snapshots include new/modified segments and pointers to extant, unmodified segments. Garbage or segment cleaning is introduced to reclaim space when snapshots are retired. Vrable et al. differentiate between two different segment cleaning schemes: in-place cleaning and no-segment-modification cleaning. The system is empirically evaluated and is shown to perform well. Pros: 1. Time to recover is empirically evaluated, and is (reasonably) rapid. 2. The cost to consumers based on Amazon's S3 pricing model seems extremely competitive compared to Dropbox and other consumer storage services. Cons: 1. The authors mention designing heuristics to "group data by expected lifetime when a backup is first written in an attempt to optimize segment data for later cleaning," as part of their future work. I believe any heuristics for such predictive data analysis would have to be performed at the client side and would unnecessarily increase the client's overhead. 2. By shifting all application logic to the client side, system overhead is shifted to the client that could potentially result in dramatic variance among client devices (e.g. how does Cumulus perform on a mobile device vs. my PC). 3. Designed more for full system backups rather than for tracking individual files. From: lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide] Sent: Thursday, March 03, 2011 12:52 AM To: indy SpamElide Subject: 525 Review 03/03 CS 525 - Review: 03/03 Storage II Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009 This paper first identified the trade-off between thin-cloud and thick-cloud. Most of the current backup services provide integrated solutions that are application-specific in order to focus on optimize storage capacity and bandwidth efficiency. The paper suggested the design choice on the other end of the spectrum – a generic, light-weighted and highly portable solution that can be run on any cloud-based storage service (so called thin-cloud). Their system, Cumulus, sacrifices performance for portability and provides less functionality than those in thick-cloud. However, the paper argued that by maintaining good enough performance, the portability is more desirable, because Cumulus (the prototype running on Amazon S3)is much cheaper and backing up on multiple providers reduces business risk. The first and major contribution of the paper is to identify the difference preference over cloud-computing. For general public or small business, the performance (bandwidth and space capacity in the case of storage) is not always the most important concern. Since cloud is a fairly young business, no one knows the single right answer for the service and business model. As the paper argued in the end, their proposal about portable solution might be a desirable alternative solution to integrated service-oriented infrastructure. As for the architecture, it seems to me that the paper does not have any novel elements, and Cumulus just picks up and puts together many developed concepts and techniques. However, this might be one of its advantages, since simple and effective implementation of the storage server is always a plus. The final contribution is the experiment and the prototype evaluation. Comments/questions/critiques: To provide a very high portability, the paper adopted a write-once storage model. As a result, Cumulus provides a very limited interface that only supports four operations. Even file attributes setting cannot be changed or fetched, which seems a very frustrating design from the user’s point of view, since people might want to share backup files or check the version in order to do some sort of synchronization of the data storage. In this sense, would there be many consumers that want to use this portable service? Metadata is stored in a text format to provide transparent feature, for which I do not understand its importance. Moreover, on the downside of using this plain text scheme, the privacy or security is compromised. Since the metadata contains the cryptographic hash, the malicious people may easily pollute the files by dictionary attack. The paper identified segment cleaning as important parameter (actually, there are segment size, and the threshold). However, how about the dynamic schedule of generating snapshots? In the paper, it seems only periodic snapshots were taken (weekly or daily). Would an algorithm that dynamically generates snapshot heuristically based on the number of files changed and the time passed since last snapshot or other choices (like user preferences) helps increase efficiency? SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010 The paper dealt with the issue of malicious server and proposed a system, SPORC, to support generic collaboration service where fork* consistency in the existence of malicious server is guaranteed among clients. Moreover, clients can dynamically control the access permissions (dynamic membership) and the data is confidential from the server and unauthorized clients at any point of time. Finally, if server’s misbehavior is detected, clients can automatically switch to a new server and recover the previous good data. The idea is similar to practical Byzantine fault-tolerant (PBFT) systems in the sense that server is always not trusted, and its only two functionalities are assigning and redistributing the execution order to the clients. However, SPORC does not need the usual threshold value of non-faulty server in PBFT due to the two main elements, Operational transformation (OT) and Fork* consistency. The paper’s most important contribution is to seamlessly integrate these two techniques. OT is used to resolve conflicts. Fork* consistency and usage of chain of hashed history ensure that SPORC can beat the threshold of PBFT (, since in general, PBFT requires strong consistency) and provide much more efficiency. Comments/questions/critiques: The system relies too much on the client, which might be a big disadvantage. First, it might be tedious to authorize large amount of clients. Second, client’s side might be much easier to be compromised, especially for those have not much knowledge about computer security and when there is a huge number of clients want to access the same piece of applications (this is the reason the paper was against peer-to-peer design). Since SPORC cannot tolerate any client’s faulty behavior, I am not convinced that this is a very practical system. If the number of clients is moderate (, in which case, all of them would be more likely all nonfaulty), would SPORC necessarily outperform other peer-to-peer design or the naïve design (flooding everything)? It would be interesting to see the comparison, which is not performed in the paper. In the evaluation of key-value store, only 16 clients are involved. SPORC might not be the best solution here, since there are much overhead (hashed history). Why bother tolerating untrusted server? Just let clients that always trusted each other (assumption of the paper) flood everything. From: lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide] Sent: Thursday, March 03, 2011 12:51 AM To: indy SpamElide Subject: 525 review 03/3 CS 525 - Review: 03/03 Storage II Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009 This paper first identified the trade-off between thin-cloud and thick-cloud. Most of the current backup services provide integrated solutions that are application-specific in order to focus on optimize storage capacity and bandwidth efficiency. The paper suggested the design choice on the other end of the spectrum – a generic, light-weighted and highly portable solution that can be run on any cloud-based storage service (so called thin-cloud). Their system, Cumulus, sacrifices performance for portability and provides less functionality than those in thick-cloud. However, the paper argued that by maintaining good enough performance, the portability is more desirable, because Cumulus (the prototype running on Amazon S3)is much cheaper and backing up on multiple providers reduces business risk. The first and major contribution of the paper is to identify the difference preference over cloud-computing. For general public or small business, the performance (bandwidth and space capacity in the case of storage) is not always the most important concern. Since cloud is a fairly young business, no one knows the single right answer for the service and business model. As the paper argued in the end, their proposal about portable solution might be a desirable alternative solution to integrated service-oriented infrastructure. As for the architecture, it seems to me that the paper does not have any novel elements, and Cumulus just picks up and puts together many developed concepts and techniques. However, this might be one of its advantages, since simple and effective implementation of the storage server is always a plus. The final contribution is the experiment and the prototype evaluation. Comments/questions/critiques: To provide a very high portability, the paper adopted a write-once storage model. As a result, Cumulus provides a very limited interface that only supports four operations. Even file attributes setting cannot be changed or fetched, which seems a very frustrating design from the user’s point of view, since people might want to share backup files or check the version in order to do some sort of synchronization of the data storage. In this sense, would there be many consumers that want to use this portable service? Metadata is stored in a text format to provide transparent feature, for which I do not understand its importance. Moreover, on the downside of using this plain text scheme, the privacy or security is compromised. Since the metadata contains the cryptographic hash, the malicious people may easily pollute the files by dictionary attack. The paper identified segment cleaning as important parameter (actually, there are segment size, and the threshold). However, how about the dynamic schedule of generating snapshots? In the paper, it seems only periodic snapshots were taken (weekly or daily). Would an algorithm that dynamically generates snapshot heuristically based on the number of files changed and the time passed since last snapshot or other choices (like user preferences) helps increase efficiency? SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010 The paper dealt with the issue of malicious server and proposed a system, SPORC, to support generic collaboration service where fork* consistency in the existence of malicious server is guaranteed among clients. Moreover, clients can dynamically control the access permissions (dynamic membership) and the data is confidential from the server and unauthorized clients at any point of time. Finally, if server’s misbehavior is detected, clients can automatically switch to a new server and recover the previous good data. The idea is similar to practical Byzantine fault-tolerant (PBFT) systems in the sense that server is always not trusted, and its only two functionalities are assigning and redistributing the execution order to the clients. However, SPORC does not need the usual threshold value of non-faulty server in PBFT due to the two main elements, Operational transformation (OT) and Fork* consistency. The paper’s most important contribution is to seamlessly integrate these two techniques. OT is used to resolve conflicts. Fork* consistency and usage of chain of hashed history ensure that SPORC can beat the threshold of PBFT (, since in general, PBFT requires strong consistency) and provide much more efficiency. Comments/questions/critiques: The system relies too much on the client, which might be a big disadvantage. First, it might be tedious to authorize large amount of clients. Second, client’s side might be much easier to be compromised, especially for those have not much knowledge about computer security and when there is a huge number of clients want to access the same piece of applications (this is the reason the paper was against peer-to-peer design). Since SPORC cannot tolerate any client’s faulty behavior, I am not convinced that this is a very practical system. If the number of clients is moderate (, in which case, all of them would be more likely all nonfaulty), would SPORC necessarily outperform other peer-to-peer design or the naïve design (flooding everything)? It would be interesting to see the comparison, which is not performed in the paper. In the evaluation of key-value store, only 16 clients are involved. SPORC might not be the best solution here, since there are much overhead (hashed history). Why bother tolerating untrusted server? Just let clients that always trusted each other (assumption of the paper) flood everything. From: Qingxi Li [cs.qingxi.li SpamElide] Sent: Wednesday, March 02, 2011 11:34 PM To: Gupta, Indranil Subject: 525 review 03/03 Cumulus: file system backup to the cloud This paper introduces Cumulus which is designed to back up the system on the cloud. Cumulus minimizes the storage cost of the cloud and the bandwidth cost of the Internet. Cumulus requires the could only provides four operations for the whole file which is get, put, list and delete. This backup software based on segment which is made up by many small files. The reason why Cumulus uses segments is because smaller files are inefficiency and without pipeline, transmitting them to the server will cost much more than the segments. The other reasons are compress segment will be more efficient and group files can hide the size of the files. However, the disadvantage of segments is when restoring from backup, we should download all the segments which include the file we want. This may be much larger than the original size of data. The other disadvantage is that it will make is more possible for a segment to be cleaned as garbage. If we use file, when all the snapshot didn’t connect this file, it can be cleaned. However for segments, we should wait until all the files in it didn’t have connection with any snapshot, then it can be cleaned. Cumulus snapshots consist of a metadata log and life data. Both of them are separated into blocks and these blocks packed together to make segments. Metadata log list fall files and for each file, it includes the properties like modification time, ownership and a cryptographic hash to verify the integrity of the file. Besides this, it also includes many pointers to the blocks of this file. Metadata is in a text. Metadata is pointed by a snapshot descriptor which includes the segments connected by this snapshot. For garbage collection, when Cumulus deletes backup file, it only deletes the snapshot descriptor and initial the garbage collection process. This process will check all the segments to find out the segment without connection and delete them. For the segments only a few files be connected by some snapshot, it will collect all these files and make a new segment. For the new snapshot, it will connect the new segment and after all the connection to the old one being deleted, the old segment can be deleted. This will waste some space and increase the number of segments needs to be checked in garbage collection. The other problem is that the author argues that current backup software locking customers into a particular provider and Cumulus will not. However, Cumulus also requires the cloud only provide the four operations, get, put, list and delete, for the whole files. This request also limits this software to some special cloud as some of the clouds also provide other methods to change the content of the file. For Cumulus, if the could provides some methods to change the files, it may have some security problems. For example, as the metadata is in text, if the cloud provides method to change file, this file may be changed by some malicious nodes which hack into the cloud. Towards Robust Distributed Systems This PPT is interesting. It mentions some problems we should care about when we want to design a new distributed system. It also gives some data of the real data centers. I was impressed that the crashes and disk failures happen weekly which are so frequently. And there are also several times of power outage which I just think it happens fewer than it. The other thing the presenter mentioned is the tradeoff between ACID which is also be mentioned in the distributed system of Amazon. Besides this, the author gives us some suggestions about the interface design. I find some of them are very useful, like we should consider whether the other side can be trusted and the partial failure problems. Besides this, the system had better support the partially evolution. From: mark overholt [overholt.mark SpamElide] Sent: Wednesday, March 02, 2011 6:11 PM To: Gupta, Indranil Subject: 525 review 03/03 Mark Overholt CS525 03/03/2011 Cumulus: Filesystem Backup to the Cloud Summary: Cumulus is a cloud based backup storage system. It relies on a thin cloud assumption. That is to say, the cloud needs only to have put, get, list, and delete operations for its files. Also, Cumulus works on the granularity of whole files. It does not break up files into chunks, rather it maintains entire files. Since the interface doesn't provide a modify functionality, the way to modify it is by deleting it and storing the new version. Cumulus revolves around the idea of snapshots. Each snapshot logically contains two parts: the metadata log that lists all the files backed up, and the file data itself. Both parts are broken into blocks or objects, which are numbered sequentially and packed together into segments, which have unique names. The stored files can be rebuilt using this snapshot in case of a failure. Cumulus is optimized for a complete restore. A partial restore is inefficient and requires fetching of a number of segments. When segments from old snapshots are no longer needed, the segments are cleaned when no more new segments point to or depend on such segments. In order to facilitate such cleaning, Cumulus may have old and new snapshots refer to different copies of same data, thus, storing redundant information. A working prototype is built using Amazons S3 (Simple Storage Service). In implementing prototype, the authors make a number of design decisions to maintain a balance in the tradeoffs involved between efficiency and resultant storage costs. Discussion: Pros: The backup service supports modification of backed up files. Although this is done via storing redundant information, supporting such a feature enables the usage of the service in a range of applications. Cumulus does not modify the files in place and hence, provides failure guarantees against snapshot corruption Only requires a minimal interface from the cloud, so can easily use any cloud storage provider. Cons: There is no inbuilt compression or encryption built into the system. There is a heavy reliance on external tools, which will evidently not adaptive to any requirements of a particular system. Also, a vulnerability of the external tool will result in a vulnerability of the backup system too. Segment maintenance is disorganized and new segments may point to parts of old segments that could be otherwise cleaned up. This introduces a lot of redundancy and wastes storage, a disadvantage that is not tolerable in a storage system where clients pay on the bases of their usage. SPORC: Group Collaboration using Untrusted Cloud Resources Summary: SPORC is a cloud based system for collaborative applications using un-trusted servers. SPORC challenges the belief that applications must sacrifice strong security and privacy of data to enjoy the benefits of cloud deployment. SPORC’s cloud servers see only encrypted data, and clients will detect any deviation from correct operations. The users of SPORC put their trust in their own cryptographic keys, and not in the cloud provider’s good intentions. SPORC is a flexible framework that could be used for a broad range of collaborative services. It claims it can propagate modifications quickly amongst many users, that it can tolerate slow or disconnected networks during the updates, and that it can keep data confidential from the server and unauthorized users. If there is a server that is misbehaving, it can detect that servers actions and recover from the attack. At a high level, a SPORC application is synchronized between multiple clients, using a server to collect updates from clients, order them, then redistribute the client updates to others. Discussion: Pros: I like how they thought of many different types of security threats, instead of just focusing on one, and building a system around that, leaving other potential holes. Encrypting the data transferred in and out of the server did not seem to have much effect on latency. Cons: SPORC still has a slightly centralized portion to it. While it is assumed that central server is un-trusted, and it takes action against that….it could still be a bottleneck as well as a target of DDOS attacks. From: Tony Huang [tonyh1986 SpamElide] Sent: Wednesday, March 02, 2011 2:39 AM To: Gupta, Indranil Subject: 525 review 03/03 Zu C Huang zuhuang1 * Cumulus: Filesystem Backup to the Cloud Core Idea: In this paper, the authors introducs a thin-cloud system to back up files in a cluster. The system interface is minimal, which only consists of 4 basic operations: get put, list and delete. Business logics are implemented wholly in the client side. The system is write-once storage model, which means files are never modified once it is stored until its deletion. Internally, cumulus aggregates smaller files into large units called segments. Files are broken into blocks, or objects in the paper, and packed into segments. The meta-data of a snapshot contains pointers to the latest objects. Retrieving a snapshot can be achieved by walking the DAG of objects rooted at the snapshot metadata node. The system only backs up the small portion of file that is modified. Unneeded blocks are automatically garbage collected by the system. The system provides a parameter to let the user decide the percentage of unused space at which the system would try to clean a segment and free up space. Pros: * A simple system interface. * Efficient usage of system by breaking files into blocks and only back up changed blocks. * Easy for complete restoration of large size file. * Optional encryption scheme is effective and practically useful. Cons and points missing from the paper: * The system is single-purposed. * The system did not provide a throughput test. * The paper started with a discussion of thin-cloud and thick-cloud trade off, but it never returns to fully discssing this trade off. What exactly does the author mean by thin-cloud and thick-cloud is never clear in the paper. * I think the architecture described in the paper is general and it's not just for cloud system. It does not provide the normal cloud services such as replication or fault tolerance. I would like to see a more in-depth discussion of what functionality of the cloud is being leverged in the work. * SPORC: Group Callaboration using Untrusted Cloud Resources Core Idea: In this paper, the authors present a system using untrusted cloud-base servers implement collaboration service. It utilizes techniques from operational transformation and fork* consistency protocols. In short, the operational transformation models changes to the shared objects as series of transformations. Various transformations made by different nodes in the system could be merged by pre-define transformation functions. Fork* consistency is that, if the order of two operations are exchanged, it would be detected by constructing a hash-chain and compare its value to the correct hash value. Pros: * Use mature techniques to implement what the authors want. Cons: * Operational transformation, while being clean in theory, can get very messy for any meaningful applications besides the most trival ones because of the complexities of most real-world applications. * The authors does not addresses the network connection issues and its entailing random ordering problems. * The author assumes that when a client detects a malicious server, it can use another one. This assumption may be violated in real life and introuce complexities that the authors did not address. If we are using more than one server to serve a file, how to mitigate differences between different servers? How to handle byzantine servers? If one user identify a malicious server, how should it contact other users to switch? All these real world complexity issue is never addressed in the paper. * The design places too much computation requirement on end-users since it has to do all the encoding and decoding and transformation. It may be very power consuming for the clients described in the paper such as mobile phones. This paper presents some interesting ideas on how to design distributed collaboration system. But this paper has a lot of real-world considerations that should be addressed to be a useful infrastructure. -- Regards -- Tony