From:	Tengfei Mu [tengfeimu SpamElide]
Sent:	Thursday, March 03, 2011 12:31 PM
To:	Gupta, Indranil
Cc:	indy SpamElide
Subject:	525 review 03/03

1. Cumulus: filesystem backup to the cloud

This paper proposes Cumulus, a cloud-based backup system which is designed around a minimal interface thin cloud. It assumes that all the application logic are implemented solely by the client. The core algorithm is based on packing smaller files into data objects segments and creating those segment snapshots. Every snapshot descriptor is composed of a timestamp and a pointer to the segments it includes. As for implementation, it supports only write-once operations, allowing no modifications, but permitting deletion and re-use of memory space. And the authors shown that thin-cloud solutions can be cost-effective alternative for the thick-clouds that are designed for specific application.

 
Pros:

1. Cumulus is cost-effective backup on the cloud storage

2. Cumulus is efficient because it packs smaller files into segments as data blocks or units of storage 

3. Cumulus simplifies the garbage collection due to its usage of snapshot descriptors

Cons:

1. Cumulus has increasing overhead for the frequently written data.

2. Cumulus didn’t consider about how to support the failure of the backup site.

 
2. Towards robust distributed system

During this keynote speech, Prof. Brewer discusses some problems of current distributed systems, especially about the robustness. He argued that classic distributed systems are fragile, although they are widely adopted and popular. Specifically, he list 3 issues that why Inktomi does not rely on classic distributed system research. (1) The problem of state: most classical distributed system research concentrate on the computation instead of the data, but he argues that distributed computing is data-intensive in nature. (2) Consistency vs availability: tradeoff between consistency, availability, and tolerance to network partitions is introduced. (3) The boundary problem: How the interface between computational entities should be.

 
Pro: 

1. The talk explains the main problem lies in the design of traditional distributed systems, and had great impact on the following distributed system design.

2. The author points out reasoning random failures using probabilistically method.

Con:

As for the availability problem, duplication should the most straightforward approach.

From:	Igor Svecs [isvecs SpamElide]
Sent:	Thursday, March 03, 2011 12:27 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

Igors Svecs
CS 525
2011-03-03

Cumulus: Filesystem Backup to the Cloud

SUMMARY

In this paper, the authors explore tradeoffs in a thin-cloud (simple
interface, more portable) vs. thick-cloud (more integration with
application, more efficient) cloud model by building a filesystem
backup application on top of Amazon's S3. Small files are grouped in
segments to reduce overhead. Cumulus assumes that backup client will
keep a significant local state for optimization purposes, and the
storage cloud will be completely content-agnostic.

COMMENTS

Not being optimized to track the history of individual files presents
a significant disadvantage in many scienarios where a single file is
accidentally deleted and needs to be restored.

It would also be interesting to see backup software that can assume
both thin-cloud and thick-cloud models. It would check if the cloud
supports "thick" operations, and possible fall back to the thin mode.

The authors could have explored the concept of storing state on the
cloud for redundancy - how large the state needs to be; how often it
is accessed, etc.

Thin cloud / thick client approach is also less useful in enterprise
scenarios, where multiple machines need to be backed up. This is
typically done by having a backup server and multiple backup agents on
the clients. In this case, backup server would need to keep track of
state of all the clients.

Using pricing for evaluation is a reasonable, but not particularly
innovative idea. Moreover, it will depend on service provider.


Towards Robust Distributed Systems

SUMMARY / COMMENTS

This set of slides first talks about distributed system in a
business/datacenter setting. They mention that modern DS should focus
on data rather than computation, which agrees with current cloud
computing research (and these slides are from 2000; long before the
concept of cloud computing). It includes a discussion of the CAP
theorem that talks about tradeoffs between consistency, availability,
and tolerance to network partitions.

It appears from recent developments (such as Dynamo) that BASE is
indeed taking over ACID, but consistency will always be important in
reliable database transactions.

They argue that root cause of RPC problems is that they emulate local
procedure calls. However, this greatly simplifies development and
generality of code especially in distributed systems. Perhaps we need
a better exception mechanism to overcome this problem.

After reading the entire set, I can say that the speaker predicted
many properties of emerging cloud systems – problems are I/O bound,
but CPU bound, high reliability (zero downtime allowed), built-in high
tolerance (software will be buggy, hardware will fail, so we have to
deal with it).
From:	Chi-Yao Hong [cyhong1128 SpamElide]
Sent:	Thursday, March 03, 2011 12:27 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

---- Cumulus: Filesystem Backup to the Cloud, FAST’09----
This paper proposed a backup storage system that aggregates the data
from user machines over Internet. Backup is an increasingly
interesting application, especially for thin-cloud services. For end
users, Cumulus is a simple solution that could provide reliable backup
services. To minimize the capital cost of data center storage, Cumulus
aggregates the data to improve the storage utilization.
Pros:
1. The proposed user interface is succinct and has good portability.
2. Unlike previous papers which leverages on special storage
architectures, this paper does show that an efficient network storage
system could be built using generic storage architecture.
Cons:
1. This paper failed to convince me that cleaning is a novel way to
improve the storage overheads. There are many other ways to “compact”
the disk. One could simply remove the limitation of the segment they
defined, and do some local allocation heuristic for each time we
backup the client’s data.
2. What is the right time granularity for network storage backup?
Could we do even more fine-grained? It seems the proposed architecture
might be promising to provide smaller backup intervals, but they did
not discuss this issue.
3. Cumulus did not address the storage failures. A simple data
replication scheme could help, to increase its reliability.

---- SPORC: Group Collaboration using Untrusted Cloud Resources, OSDI’10 ----
This paper proposed SPORC, a scheme to build cloud applications with
untrusted server. To provide the integrity, all the data seen by the
server are encrypted. However, server could temper the execution
order. To avoid this, SPORC adopt the idea from the fork* consistency
to append the hash on every data block. This could guarantee that
clients will detect that a server is malicious by exchanging only 2
messages. SPORC also support dynamic admission control where users
could modify the access control list, even when they are disconnected
to servers. They build SPORC prototype with two applications:
key-value store and browser-based collaborative text editor. The
experimental evaluation shows that SPORC could provide very small
response time for data writing, and the system throughput is
acceptable.
Pros:
1. For a system paper, their prototype is solid. The two applications
they built both have a wide range of applicability. Therefore, it is
very convincing that their system would work well in practice.
2. The evaluation results look very good. The request serving latency
is small enough for practice use.
Cons:
1. To detect the malicious server, they use a sophisticated algorithm
with no-very-small state on clients. Although they described how to
use checkpoint to reduce the client’s state, they did not fully study
the overhead imposed on client. To get rid of storing considerable
amount of states on the client side, one simple solution is to let
client communicate with each other. In this case, server cannot hide
or discard data, to prevent inter-client communication.
2. Their scheme only guarantees that server cannot corrupt the shared
state. However, server could introduce other undesired noises to
degrade the system performance. For example, server could simply block
the communication between clients. Also, server could inject random
delays to slow down the communication. It is hard to detect if a
malicious server simply inject random delay.
3. Their scheme assumes the clients are all honest. However, as all
the communications have to be redirected by server, server could
launch replay attacks. For example, suppose A want to do collaborative
writing with another users, B. Then server could block B, and
arbitrarily create “virtual” clients C to pretend as B to
communication with A.

-- 
Chi-Yao Hong
PhD Student, Computer Science
University of Illinois at Urbana-Champaign
http://hong78.projects.cs.illinois.edu/
From:	anjalis2006 SpamElide on behalf of Anjali Sridhar [sridhar3 SpamElide]
Sent:	Thursday, March 03, 2011 12:10 PM
To:	Gupta, Indranil
Subject:	525 review 03/03


Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009

Cumulus is designed to provide a limited storage service to clients. It aims to provide a simple service which enables flexibility and reduces cost for the clients. The design of Cumulus offers a limited interface of put, get, delete and list.  These operations refer to entire files as opposed to arbitrary ranges of bytes in files.  Cumulus implements a write once storage model. Hence once the put command is issued, the file is never modified in its place except to delete it.  Cumulus groups data from smaller files into larger units called segments.  The hierarchy of Cumulus starts at the lowest level with the Snapshot (metadata log and the file data), this is broken up into blocks called objects and aggregated as a part of a larger block called segment. This is the unit that is stored on the server. The snapshot contains a snapshot descriptor. This contains a pointer to the root object. If the traversal is started from the root object, all other segments required by the snapshot are found. 

The write once model ensures that if put fails an old backup cannot be corrupted. Usage of segments by aggregating smaller files results in lesser network overhead and metadata storage. It allows for compression to take advantage of interfile redundancy. Aggregation of smaller files also helps in encrypting the file that is being sent across the network. Cumulus also supports incremental backups in large files by storing the changed portion of the file again. An advantage of the thin client backup system Cumulus is that due to the very basic storage service provided, the clients can choose a number of storage providers to host their data minimizing the business risk. 

The write once model results in keeping snapshots of the file at multiple points leading to redundant data. This increases the storage required by the client and leads to increase in costs.  Cumulus does not provide coordination between multiple backup clients. Partial restoration might be very inefficient since Cumulus downloads multiple segments from the server. It is also possible that clients require some additional service like backup data cleaning, encryption, de-duplication when using multiple backup clients etc. The flexibility and cost efficiency of Cumulus might outweigh these considerations.

 
Towards robust distributed systems, Eric A. Brewer, Keynote, ACM PODC 2000

Eric Brewer talks about his work in Inktomi. They work on global search engines and distributed web caches. He talks about how the classical definitions of distributed principles cannot be implemented in a practical setting. The three topic addressed are maintenance of state, consistency Vs availability and finally the boundaries separating client/server in remote procedure calls.

We are able to see the implementation of classical distributed system principles or lack of in a real world setting. In the CAP theorem we see that we can only maintain two of the three properties – consistency, availability and tolerance to network partitions. The classical distributed systems focused on a computation intensive workload while the current systems are looking at data intensive workloads. Classic Distributed system principles also did not take into account the location specific information. 

There is a definite tradeoff involved between availability and consistency. Hence the presence of one does not exclude the other. Since the speaker mentions that all systems are probabilistic, in real systems it is usually a question of what the system should be optimized more for.

From:	Simon Krueger [skruege2 SpamElide]
Sent:	Thursday, March 03, 2011 12:05 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

Cumulus: Filesystem Backup to the Cloud, Michael Vrable, Stefan Savage, and Geoffrey M. Voelker

The core idea of this paper is to present a backup program that runs on top of abstract cloud storage services like Amazon EC2. They want to compare the overhead and price of this strategy to more specific services. To do this they implemented, Cumulus, which has a simple put/get/delete/list API on whole files. They design the back up to store segments of files which allows them to easily store incremental backups of the data. They were able to show that the storage costs Cumulus occurred were lower than both general backup solutions and specific backup services.

Their design does not allow multiple client backups.

I think this paper was a good investigation of how common cloud services today can be used. In particular, it was interesting to see that specific backup services (i.e. they ONLY provide backups) are more costly than using their backup service that can run on top of any arbitrary cloud data service.


SPORC: Group Collaboration using Untrusted Cloud Resources, Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten

The core idea behind this paper is provide a collaborative text editing program and key value service that runs on top of an untrusted server. The key idea behind their implementation of SPORC is to combined Operational Transformation (OT) which allows concurrency without locking and Fork* Consistency which can prevent the state of two clients from forking. Basically by combining OT and fork* they can easily have efficient realtime collaboration with OT and notice if the untrusted server has tampered with the service with fork*.

The server could split the clients and fork* is unable to detect this. However, if the server ever tries to rejoin the clients, fork* will be able to detect the tampering.

I think this is novel application of the two algorithms that could be beneficial in public cloud environments since services could be more easily tampered with (since they are exposed to the public) but I am not certain if this is needed in private cloud environments where services are mostly secured by other mechanisms (firewalls, passwords, internal networks, etc...).


Simon

From:	mdford2 SpamElide
Sent:	Thursday, March 03, 2011 12:05 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

Cumulus: Filesystem Backup to the Cloud

Cumulus is a storage backup system that attaches to any cloud service with only 
basic storage services. The system combines smaller files in order to reduce the 
overhead associated with multiple files. This reduces overhead in transmission, 
storage, and the costs associated with both in cloud services. Their system 
outperforms several other systems with regard to operating cost. Cumulus can also 
efficiently store small changes to large files by using the same techniques they use to 
clump small files. They use a write-once storage model, and scrub the backup when 
old snapshots are no longer needed.
The paper's main contribution is the concept of aggregating small files to reduce 
transaction overhead. The remainder of their system is competitive, but not novel. The 
paper does not address multiple source storage.


SPORC: Group Collaboration using Untrusted Cloud Resources

SPORC is a client-side system for supporting consistency by sending encrypted data 
to an untrusted server. The system uses fork* consistency on the clients to enable 
detection of deviating servers and recovery by switching servers. Clients encrypt their 
data updates, send them to the server, and verify incoming updates both via 
public/private key matching, and a hash chain over the committed history.
In this scheme, clients are left with the majority of the work. Servers do store data, 
and enable access when other clients are all offline, but there was no comparison to a 
peer-to-peer based group coordination approach. If the server is regulated to a file 
store, there may be a way use it in the same way from a lower latency peer-to-peer 
system.
From:	Anupam Das [anupam009 SpamElide]
Sent:	Thursday, March 03, 2011 12:05 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

I. SPORC: Group Collaboration using Untrusted Cloud Resources

SPORC paper addresses the security and privacy issue of cloud based
collaborative services. Cloud based collaborative services are popular
due to their global accessibility, high availability and fault
tolerance, but they lack security and privacy guarantees. If the
servers are compromised they can access all the personal data
available in them. To address this issue the paper moves application
logic from servers to clients and uses servers for just storage and
message ordering. Each client also maintains a local copy of the
shared data and optimistically performs updates with the hope of
having eventual consistency.  SPORC uses operational transformation
(OT) and fork* consistency (hashing on historical actions) for this
purpose. Since clients encrypt their data when sending it to the
server and since servers perform no other computation on the data, it
cannot access the real data. So, the basic philosophy of this paper is
that it pushes responsibility towards the clients while making server
functionality simpler (more like the thick client and thin server
architecture.)

However, there are some issues that have not been considered in the paper-
1. The threat model assumes that only servers are malicious. But
clients too can be malicious and they can launch more complex attacks
like collusion.
2. Since SPORC relies on eventual consistency there is always the
possibility that the client may crash in which case it would lose its
most recent non-committed (pending) operations. So, availability can
be hampered.
3. In the experimental part of evaluating latency the number of
clients considered is relatively small. It would have been interesting
if they scaled the number of clients to a higher degree.
4. The checking of forks through out-of –band communication is totally
dependent on clients. So, when does a client check for such forking?
Should clients do this in a periodic fashion or just on the fly? Is
there any indication as to when forking exists?
5. The extensions proposed like- recovering from fork have not been
evaluated in the paper. It would have been nice to see how long it
takes to identify a malicious server.

-------------------------------------------------------------------------------------------------------------------------------------------

II. Cumulus: Filesystem Backup to the Cloud

Cumulus talks about a backup system built on top of a thin cloud
infrastructure. As cumulus is built on top of a thin cloud it provides
only the very basic operations like- get, put, list and delete. As
update/modify operation is not provided, any update is done through a
delete operation followed by a put operation. Thus there is some cost
associated with cleaning. Cumulus uses incremental update to lower
this cost by taking snapshots at different point of time and then
using them to determine the difference in the updates. For storage
efficiency cumulus uses LFS (Log-structured File System) segment
cleaning. It also uses aggregation to exploit inter-file redundancy.
All-in-all the authors provide a nice system for backing up files for
personal usage.

Pros:
1. The system in simple and easy to use. It is not dependent on any
vendor specific service and thus allows easy migration.
2. One of the strong points of this paper is that the system is fully
implemented and is available for usage. They also provide experimental
result based on their real deployment.
3. The authors have tuned various thresholds to obtain an optimal output.
4. More cost effective than other system (like Jungle Disk, Brackup).
5. Achieves low storage overhead through aggregation and incremental update.

Cons:
1. Cumulus is more suited to WORM (write once and read more) systems
as it does not support a direct update operation. Overhead associated
with storing snapshots will also increase if updates are very
frequent.
2. There are a lot of metadata that need to be maintained. This could
become a potential bottleneck.
3. It does not support simultaneous update from multiple nodes. It is
basically used for single user update. (unlike dropbox which allows
multiple parties to update at the same time)
4. Cumulus does not support file versioning. It would have been nice
if they maintained different versions of a file so that a user could
roll back to previous versions it required.

------------Anupam
From:	kevin larson [kevinlarson1 SpamElide]
Sent:	Thursday, March 03, 2011 12:01 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

    Cumulus allows users to back up their local storage media to a
“thin cloud”, a cloud which provides only basic storage interfaces.
Cumulus only needs put, get, delete, and list in order to operate.
The current application space lacked a network storage back-up
solution which incorporated multiple snapshots, simple (thin) servers,
continuous increments, aggregated storage, and encryption, all of
which are important to the performance, cost, and security of a backup
system.  In evaluation, it was shown that Cumulus was only marginally
worse than an optimal (optimized for the task, without many of the
restrictions imposed on Cumulus), in the range of 2-5% of the storage
and upload sizes (and therefore costs).
    It was interesting to see how a system running on such a simple
interface managed to perform so comparably to an optimized system.  In
general, the evaluation was very solid, not only comparing Cumulus to
either the optimized solution,  or the common competitors (something
that seems too common in many evaluations), but both.
    As admitted in the conclusion, the specific problems addressed by
Cumulus are not terribly interesting.  However, the general concept of
the use of clouds for conventional tasks, as well as optimizations
based on the specific systems and clouds is probably going to continue
to see increased attention.
    Brewer presents a strong case for availability in his
presentation.  Inktomi is a company which focuses on availability, and
managed to keep their systems operation through a variety of problems,
ranging from the disk failures, to power/network outages, to physical
location changes.  Most classic distributed systems make trade-offs in
order to maximize consistency.  This comes at either the expense of
availability or tolerance to network partitions.  Basically Available
Soft-state  Eventual consistency (BASE) is proposed as an alternative
metric to Atomicity, Consistency, Isolation, Durability (ACID),
compromising consistency and isolation in order to maximize
availability.  Harvest, the fraction of the complete result, and
yield, the fraction of answered questions are another interesting
metric for distributed systems.  Conventionally, 100% harvest was
chosen at the expense of yield.  However, newer systems, including the
Internet prioritize 100% yield at the expense of harvest.
    It was interesting to see a fundamentally different set of
priorities and how they affected operation of a company.  High
availability systems are an interesting shift from the convention of
consistency.  The uses for highly available systems were well
presented and discussed.
   There are still a variety of areas to research in order to further
improve highly available systems, but little attention was given to
this.  It would have been interesting to see more about the
prospective research opportunities.
From:	Shen LI [geminialex007 SpamElide]
Sent:	Thursday, March 03, 2011 11:49 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

Name: Shen Li

SPORC: Group Collaboration using Untrusted Cloud Resources

This paper proposes an idea of building a wide variety of collaborative applications with untrusted servers. The two basic contributions of SPORC are: 1) prevent malicious servers to reveal the contend of communications to unauthorized entity, 2) present malicious servers to reorder the real sequence of events. To fulfill the first requirement, every piece of data sent to one client is encrypted by its own public key. So the server cannot understand the data it is processing. But the server still has a chance to modify the order of all event. To prevent this, SPORC employ two sequence numbers: client sequence number (clntSeqNo) and global sequence number (seqNo). On receiving an operation, a client verifies that clntSeqNo is greater than the previous clntSeqNo submitted by the same client, and apply the same check to seqNo. When the order is violated, the client will know that that server is doing something bad. So they can change to other servers.

Pro:

1. As they discussed in paper, it is better not to remove the server role from this kind of application. Otherwise, their will be no arbiter who can give a deterministic feedback to users actions. Under this circumstance, this paper guaranteed the security of applications even if the server itself is not trusted.

Con:

1. It is a little bit wired that SPORC propose a concurrent developing model while it cannot guarantee consistency. The operational transformation cannot handle all updates properly. In some cases it still calls for manually negotiation.

2.  The meta-operations are sent to server in the clear. So, it can be a potential threat that server can reveal this data to some interested entities.

3. If one client want to make some updates to the file, it should encrypt the operation via all other users public key. It can be costly. 

Cumulus: Filesystem Backup to the Cloud

This paper propose a system with minimal interfaces (put, get, delete, and list) to provide Cloud file backup functionality. The authors point out that current cloud-based backup services implement integrated solution that calls for specific software installed on both client side and server side. This kind of approach is able to achieve greater storage and better bandwidth efficiency. However, it also impede the portability of cloud backup services, which can be one main concern in future Internet based applications. Under this motivation, they propose the thin-cloud conception in which cloud side only expose minimal operation options. So that it can be easily ported any online storage services.

Pros:

1. This idea is somewhat similar to design the IP layer to connect both higher layer protocols and lower layer implementations. Their thin cloud idea can be easily supported by lower level service providers as well as higher level applications.

Cons:

1. This architecture is not versatile enough to support all application. At least it cannot do that according to the APIs they expose currently. For example, in some applications, users need to continuously update a very large file. And for safety reasons, they want to the file stored in cloud side always be up-to-date. It is be very costly to delete and re-upload the whole file frequently, since they do not support ability to read and write arbitrary byte ranges within a file. 

2. The segment cleaning policies should involve human efforts since the server alone is not sure about whether a file is OK to be deleted. 

From:	trowerm SpamElide on behalf of Matt Trower [mtrower2 SpamElide]
Sent:	Thursday, March 03, 2011 11:46 AM
To:	indy SpamElide
Subject:	525 review 03/03

Cumulus

Cumulus is a cloud based storage system for filesystem backups. The paper describes cloud services as either thin or thick based on the number of services they offer. The authors combined features of several previous implementations and deployed the service on top of Amazon's S3 storage service.

Cumulus identifies five features which it possesses but no previous work completely possesses. Most modern backup systems have the ability to take multiple snapshots (except rsync). About half of the surveyed works support simple server interfaces and encryption of data. The remaining two features are common although never found with the other features. Cumulus offers the ability to take incremental snapshots forever after an initial snapshot and the ability to store sub-file deltas. These combined features make cumulus an efficient, safe, and reliable backup system.

The authors took an interesting twist to previous work in that they optimized the backup service for cost on modern storage services such as Amazon's S3. This is certainly of concern to most end-users and something that has been overlooked in the past.


SPORC

SPORC is a secure framework for building cloud-based applications. Most cloud applications require the end user to trust the server. SPORC offers operational transformations and fork* consistency to allow it to be resilient to malicious server operations.

In order to ensure data is secure the framework uses cryptographic keys on the clients similar to many existing protocols. This alone does not guarantee your data is safe though. The server could rearrange PUT requests to make the data reach an inconsistent state. The first feature operational transformations allows for operations to be reordered in a commutative manner while maintaining causal consistency. The second property fork* consistency is used to detect when two copies of the data deviate. If this deviation happens, then the clients must either not see changes from the other client or realize that the server is malicious. Based on these ideas the authors created a KV store and a simple collaborative text editor.

The one clear disadvantage to this model is that the client now carries the burden of the processing load rather than the cloud. This runs against the notion of dumb devices connected to a smart cloud that many envision.

From:	muntasir.raihan SpamElide on behalf of muntasir raihan rahman [mrahman2 SpamElide]
Sent:	Thursday, March 03, 2011 11:33 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

Towards Robust Distributed Systems

Summary:
The talk discusses three classical problems in distributed systems:
state, consistency vs availability, and understanding boundaries. It
proposes to use BASE (Basically Available Soft-state Eventual
Consistency) instead of ACID by forfeiting C and I for Availability.
The CAP theorem says that for a distributed web-service, you can only
guarantee at-most two of Consistency, Availability, and Partition
tolerance. The author also points out that the boundary of a system
could be ill-defined due to false transparency guarantees. Overall,
classical distributed systems are fragile due to exclusive focus on
computation without thinking about data, ignoring location
distinctions, loosely defined consistency/availability trade-offs, and
poor understanding of boundaries. The author proposes the DQ principle
which states that  Data/query multiplied by  Queries/sec is a constant
(or, capacity * completeness = constant). A good take-home message of
the talk is that all systems are probabilistic in nature. As a result,
to build better systems, we should think probabilistically, maximize
symmetry, try to make faults independent, and use randomness in system
design. We need to stop dreaming about 100% working systems and be
prepared to accept partial results.

Pros:
(1) First look at the CAP theorem, postulates fundamental trade-offs
for distributed systems. As a result, researchers won’t run after the
holy grail of 100% available systems.
(2) Encourages probabilistic thinking of systems.

Cons:
(1) Is BASE good enough for an internet service that deals with
sensitive data, e.g. a HPC cluster that deals with financial
transactions, or high frequency trading?

Future Work:
(1) Does the CAP theorem also hold for cloud data centers?


SPORC: Group Collaboration using Unstructured Cloud Resources

Summary:

Cloud deployment is attractive for distributed systems that
collaborate on shared state, e.g. google docs, google calender and so
on, mainly due to scalability, availability, and the possibility of
real-time collaboration. However the trade-off is that clients must
trust the third party cloud with potentially sensitive data. SPORC
overcomes this by generalizing to the case where cloud servers are
untrusted. It assumes that servers have very limited functionality. In
this case they are only required to store encrypted data and order
client updates. As a result the clients are forced to store local
state. So the first major problem that arises is keeping local copies
consistent even with off-line access. This problem is solved with
Operational Transformations (OT) which can synchronize arbitrarily
divergent clients. The second issue is dealing with a malicious
server. This  is addressed using  fork* consistency  which ensures
serializability for honest servers, and tamper detection within 2
message exchanges in case of a malicious server. Clients maintain
fork* consistency by computing a hash chain over its view of committed
operations. and sending it with any new updates. Another client can
check the hash chain to detect a malicious server. The authors
evaluated SPORC by building a key-value store and a browser based
collaborative text editor with minimal code requirements.

Pros:
(1) SPORC allows users to work off-line and still maintain synchronized states.
(2) Can deal with untrusted servers even when the server isolates
clients into distinct partitions and equivocates.

Cons:
(1) The overhead of dealing with untrusted servers might be too much.
Is it possible to ignore the possibility of malicious servers and
still have good performance?

Future Work:
(1) The paper argues that OT can ensure commutative operations. Does
it also work for associative operations?
(2) SPORC servers are lightweight, in that case is it possible to
eliminate servers and work with a P2P system? The peers could use
reputation mechanisms to deal with malicious peers and eliminate the
overhead of fork* consistency.


-- Muntasir
From:	nicholas.nj.jordan SpamElide on behalf of Nicholas Jordan [njordan3 SpamElide]
Sent:	Thursday, March 03, 2011 11:12 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

Nicholas Jordan njordan3 03/03


Cumulus: Filesystem Backup to the Cloud

This Cumulus is a way to organize you data in such a way that it allows users to group the data to store backups on any cloud provider. This scheme allows people not to be locked into a certain provider, since your not relying on the provider to the maintenance the meta data of your backups.  All cloud providers allow you to put/gut/list all files/delete. The scheme is to store the data and the meta data separately. The cluster of flies/data is contained in a segment. And there is metadata called snapshot descriptions, which reference which parts of each segment reference data in other segments.

Pros:

-       data is portable

-       incremental backups, don’t incur a lot of overhead

-       can encrypt the data

Cons:

-       puts a lot of the management of the packaging of data into the hands of the client, they have to all the packing

-       its not really a service, more like a best practice

 
Additional thoughts:

I like the fact that incremental changes, don’t make you have to create entirely new snapshots. You can just reference previous data from previous snapshots. The separation of the data and metadata allows this.

 
Towards Robust Distributed Systems : CAP Thereom

I looked at the slides speech, however I couldn’t get much content from the slides. This speech is very famous and brought about the CAP theorem for distributed systems.

 
CAP Theorem

In a distributed system, there are three desired qualities (C)onsitstency, (A)vailablity and tolerance to (P)artitions. A distributed system can only have 2 of the 3. To give an example, if you system is CA, then you can’t handle network partitions because you system can’t be consistent since the whole system can’t agree on a change. If you system is AP like DNS, when a new webpage is entered not all servers see the change rite away. Even during a network partition, the system still works and is available. For the Final CP, if you are going to be totally consistent and handle network partitions you give up availability. When a network partitions happens, the servers can’t come to a total system consensus, so there’s no availability.

 
Counter Example By Daniel Abadi – Yale Professor

http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html

 
This blog post is a critique on the CAP theorem.  

1)If a system is CP, it just means that availability is not a top priority, but in the presence of a network partition there is still some degree of availability.

 
2)CA ~ CP. “As noted above, CP systems give up availability only when there is a network partition. CA systems are “not tolerant of network partitions <http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext> ”. But what if there is a network partition? What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. But there is still some degree of availability.

 
There are really CA/CP and AP systems. The fundamental difference is when there is a network partition, which do you value more consistency or availability?

 
The first two points are critiques about the theorem. But, Abadi gives an example of a system that doesn’t fit at all into the CAP theorem.  Known to Yahoo! As PNUTs or Sherpa in the academic world gives up consistency not for improving availability, but to improve the latency parameter (or minimize latency). Sending a message over a WAN increases the latency of a transition on order of hundreds of milliseconds.  By avoiding this latency hit in performance, they have to go to a concept of “timeline consistency”. There’s no real bound in the number of messages when a system will be consistent. Whenever machines receive updates, they will receive the same order of updates.

 
My final Thoughts: 

Both are sources are valuable. Brewer papers give you a good foundation for understanding the barrier in a distributed system.  Abdadi gives scenarios where systems don’t quite fit the rigid CAP system. I feel that his PNUTS example is still covered under Brewer’s CAP theorem. Lowering latency is still closely related to availability, it might be Qos of availability instead of like percentage of machines availability. 


-- 
Thanks,
Nick Jordan


From:	Jason Croft [croft1 SpamElide]
Sent:	Thursday, March 03, 2011 11:00 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

SPORC: Group Collaboration using Untrusted Cloud Resources

SPORC is a framework that provides collaborative editing of documents, automated 
merging, and off-line updates, stored on an untrusted cloud server.  To do so, it 
leverages operational transformation (OT) and fork* consistency, which allow for 
lock-free concurrent operations that converge to a consistent state and ensures the 
untrusted server cannot modify the data without detection.  Specifically, OT allows 
each client to edit the document and any updates are mapped to a set of operations 
that can be performed on it.  Since clients' updates may result in their respective 
documents diverging, a transformation function can be used to return both documents 
to a consistent state.  All operations are signed with a client's private key to prevent 
the server from attempting to modify any operations.  With fork* consistency, clients 
can detect if a server misbehaves since clients share their individual views of the 
history of the document by including it with all operations.  This prevents a server 
from notifying only some clients about operations and not others without being 
detected. 

With this design, clients can perform updates to a document while offline, and later 
merge changes when online.  Another strong point of SPORC's design was support 
for version control where conflicts must be resolved manually.  This was a small 
addition, but covers an important potential usage of SPORC.  Companies wanting to 
store source code in the cloud or online (e.g., GitHub) could do so safely, without 
worrying about malicious tenants or cloud providers trying to steal their code.  I also 
liked the brief section explaining the benefits of having an untrusted server, justifying 
why a design like SPORC is needed. 

However, the authors assume that when a server begins misbehaving, there will be a 
properly behaving server to switch to or use.  Doesn't this imply there is some trust 
in at least on of the servers not to misbehave?  Though it may be outside of the 
scope of their work, SPORC should probably support the case where there is no 
server that will behave.  Maybe there could be some way in which clients could still 
edit the document with the server misbehaving?  In addition, meta-operations are 
sent to the server in the clear, which can include changes to the access control list.  
Couldn't this allow a malicious server to edit the metadata to add itself as an editor?  
In their implementation, the authors also note performance issues with JavaScript 
and encryption.  This may be an appropriate fit for Google's NativeClient, if an 
implementation of their web-based client in native code was possible.


Towards Robust Distributed Systems

This talk by Eric Brewer explains three issues why distributed systems do not work: 
state, consistency vs. availability, and boundaries.  With state, persistence is 
difficult (more difficult than computations) and data centers exist so we can have 
consistency and availability.  In the trade-off between consistency and availability, 
Brewer notes that there is a fundamental trade-off between consistency and isolation 
and we trade these two for availability, graceful degradation, and performance.  He 
also describes the CAP theorem, in which one can have at most two of three 
properties: consistency, availability, and tolerance to network partitions.  For 
example, single site data centers forfeit tolerance to partitions, distributed databases 
forfeit availability, and DNS forfeits consistency.  For boundaries, Brewer describes 
problems in trust between the two sides, partial failures, and multiplexing clients.  
Kernels, for example, solve this issue of trust by using opaque references and 
copying arguments, but many systems like TCP and browser do not properly solve 
this problem.

In addition, Brewer argues that due to sloppy boundaries, systems become fragile, 
and the root cause of many of the problems is false transparency.  He expands on 
his ideas, and further claims that classic distributed systems are fragile because of 
lack of focus on data, ignoring location distinctions, and poor definitions of goals for 
consistency and availability.  Even though his talk is over a decade old, I think it is 
just as relevant today as it was in 2000 with so much focus now on cloud computing.  
With collaborative and online editing becoming popular (e.g., SPORC, Google Docs, 
Microsoft Office Live), there is becoming an important need and focus on data, 
consistency, and availability (and not as much on computation).
From:	iitb.ankit SpamElide on behalf of Ankit Singla [singla2 SpamElide]
Sent:	Thursday, March 03, 2011 10:50 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

1. Cumulus
-----------------
Summary: Cumulus is a system designed and optimized to take advantage
of cheap cloud storage for data backup. This is a cheap way to get
off-site data storage, thus guarding against catastrophic failure at
one site. Simplicity of design seems to be key element: a) Cumulus
requires only four simple operations from the cloud server -
put/get/list/delete. It doesn't use any 'modify' functions or
read/write functions on parts of a file. Their write-once model has
the advantage of reducing the possibility of corrupting old backups
while attempting to do a new backup run. The key idea is to group
files into segments and maintain appropriate metadata so that data
that is duplicated across snapshots can be reused instead of creating
multiple copies. There are some other interesting ideas in Cumulus to
reduce costs; for instance, their use of segment cleaning (like
defragmentation) doesn't require downloading the segment and putting
back an updated copy; instead, they just write new segments with only
useful data to the remote storage, garbage collecting the old expired
segments later.

Comments: The idea of building a tool like Cumulus on top of a thin
API brings about the trade-off of provider lock-in versus efficiency.
Exploiting a more refined API might enable more efficient function,
but limits portability. For a backup utility, it seems the latter
concern is more important and Cumulus' choice seems justified in that
regard.

2. SPORC
-----------------
Summary: SPORC is targeted at making possible real-time collaboration
between multiple clients while making sure that a) client-data is
unreadable by the server b) the server can't tamper with the data
without risking detection c) clients can fix such tampering after
detection. To fulfill these goals, SPORC addresses the problems of
keeping client copies of data consistent and handling malicious
servers. The key idea of SPORC is to combine two paradigms: a)
operational transformation, which mutates operations in such a way
that their independent application by different clients reaches a
consistent state; b) fork*-consistency, which ensures that clients
which see each other's updates after a malicious 'forking' of their
views by the server will be able to detect such activity. SPORC
essentially applies OT to recover after detection of said forks.

Comments: Given the concerns about confidentiality and integrity of
cloud-hosted applications, this seems a very timely effort. It would
also be interesting to see what happens when a few of the clients
collude with the malicious server against other clients. But I think
this concern might be largely separable as a Byzantine consensus kind
of problem. Also, how does one really distinguish between
natural/genuine multiple client failures and a malicious server
forking a set of clients from the rest? I'm unfamiliar with much work
in this direction, but this seems like a pretty nice paper.

Ankit
--------------------------------------------------------------------
Ankit Singla
Graduate Student, Computer Science
University of Illinois at Urbana-Champaign (UIUC)
http://www.cs.illinois.edu/homes/singla2/
From:	Agarwal, Rachit [rachit.ee SpamElide] on behalf of Rachit Agarwal [agarwa16 SpamElide]
Sent:	Thursday, March 03, 2011 10:46 AM
To:	Gupta, Indranil
Subject:	525 review 03/03


-----
Cumulus: Filesystem Backup to the Cloud

The paper presents a file system backup solution. The most interesting feature of Cumulus is its simplicity and "flexibility". It allows the users to migrate from one storage provider to another without significant overhead. The interface is simple, which in turn provides flexibility in terms of implementing a more optimized storage service on top of Cumulus.

1. Cumulus is restricted to write-once storage model. It may not be efficient for applications that require frequent modifications and back-up of the same files. Incremental back-up is mostly offloaded to the client end, which seems like a tricky design decision.

2. Cumulus made the design decision of storing many small files as a single segment. It seems to me that reading a particular small file will require reading the whole segment, which can be pretty inefficient. In particular, how does one extract a particular file efficiently? If compression is used, does one need to decompress the whole segment to extract a single file?

3. Segment cleaning seems to bring the real overhead for Cumulus. It writes a new copy of the data every time a snapshot is updated. This could be very space-inefficient (but definitely quick and dirty fix to the problems with LFS). In general, leaving segment cleaning to the client-side can lead to inefficiencies at the server end.

-----
Towards Robust Distributed Systems

The talk outlines various fundamental issues in distributed systems. The main idea is to highlight the poorly defined concepts like consistency and availability (in terms of achievable performance). In general, the insights provided on building robust distributed (especially, from the perspective of an industrial implementation) are really interesting. Finally, it is interesting that the author puts forward the idea of focusing on data rather than computation, a paradigm which seems to have gained more focus recently.

In general, CAP theorem (law?) is very thought provoking. In today's systems, it may be possible to avoid giving up on either of consistency, availability and partitioning of the system by giving up on latency. I wonder if latency falls into "availability"? If no, why did the author not consider talking about latency? If yes, does availability means zero-tolerance to latency?

-----

Best regards,
Rachit
--
Siebel Center for Computer Science,
University of Illinois at Urbana-Champaign,
Web: http://agarwa16.wikidot.com
----
Sing as if No one is Listening,
          Dance as if No one is Watching,
Dream as if You'll Live Forever,
          Live as if You'll Die Today !!

From:	yanenli SpamElide on behalf of Yanen Li [yanenli2 SpamElide]
Sent:	Thursday, March 03, 2011 10:37 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

paper 1. Towards Robust Distributed Systems

Summary:

This talk summarizes several fundamental limitations of previous distributed systems,
and proposes several novel ideas for future DS research. It's surprised that many of
the ideas in this talk are actually implemented successfully in recent systems. The fundamental
limitations of the classic DS include focus on computation but not data, poor definitions of
ACID vs BASE goals, ignoring location distinctions, and poor understanding of boundaries. One of the
most important tradeoff discussed in this talk is the Consistency vs Availability. Database research is 
focussed on higher consistency guarantees (ACID) but distributed system community focus on 
Availability. This trade-off leads to the Consistency-Availability-Network Partitions (CAP) Theorem, 
which states that an application can only choose any two out of the above three characteristics. The CAP
Theorem doesn't state that you can only use two configurations on the CAP space. In fact, one can design systems
in the whole CAP space with a clear ACID/BASE goal in mind. However, the research in the area at the time is 
largely unexplored; one reason is that the database community and DS community is separate. 

Pros
- Summarizes many interesting limitations in the distributed systems field.
- Highlights some important ideas, such as the consistency vs availability trade-off. 
- Emphasizes the notion of thinking probabilistically to account for random failures

Cons
- The author claims that the message-passing will be a winning strategy; however, we have only observed limited success
of this strategy in this decade.


paper 2: Cumulus: Filesystem Backup to the Cloud
summary:
This paper presents Cumulus, a cloud based filesystem backup service. The primary motivation for designing Cumulus is that current backup systems have specific formats for data storage and protocols of performing the backups. The lack of generic formats requires a heavy cloud layer for the user. Cumulus has simple storage server interface with thin server layer to provide portability. The storage server interface has four operations: get, put, list, delete. Cumulus backs up the file system via snapshots of the storage system at various time points. Both metadata and files are broken into objects and stored on server. Cumulus is optimized for a complete restore. When a system failure happens, the stored files can be rebuilt using the snapshots. Cumulus is built on the  Amazons S3 ( Simple Storage Service). 

Pros
- A cost effective solution for backup.
- building a generic service on top of a thin cloud computing resources layer. This design decouples service from infrastructure, having more flexibility for choosing infrastructure providers.

cons
- only optimizing for a complete restore. Partial restores might be inefficient.
- no inbuilt compression or encryption module in the system, heavy reliance on external tools, which might not be efficient and effective.


-- 
Yanen Li

Ph.D Candidate
Department of  Computer Science
University of Illinois at Urbana-Champaign
Tel: (217)-766-0686
Email: yanenli2 SpamElide


From:	Andrew Harris [harris78 SpamElide]
Sent:	Thursday, March 03, 2011 10:27 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

Review of “Cumulus: Filesystem Backup to the Cloud”, Vrable et al, and “SPORC: Group Collaboration using Untrusted Cloud Resources”, Feldman et al

Cumulus is a system for so called “thin cloud” backups; that is, backups with only a minimal set of operations exposed to clients (get, put, delete, list) that are not reliant on vendor-supplied client software to operate. Underneath this client layer, a number of technologies common to thick-client backup services are also implemented: delta snapshots, sub-file deltas, encryption, multiple snapshot points, automated garbage collection for deleted snapshots. They build a prototype system on top of Amazon’s S3, and demonstrate a relatively low cost of operation based on S3’s pricing model, due to the low backup overhead required for their system. They also show that their system is technically superior to a handful of existing backup systems, in that their system requires overall lower storage, lower upload bandwidth use, and a lower total number of operations.

Though decried by some in the computing community for reasons of privacy, data control, and data freedom, cloud backups seem a natural use of distributed computing systems. Large companies have been doing this for years, with nightly backups being sent between locations to ensure data fastness in the event of natural disasters or otherwise. With storage prices continuing to fall, and with services such as S3 exposed to the public, it is a perfectly reasonable progression to see consumer backup taking place in massive storage systems.

A curious conundrum exists, however, in encrypting data onto massive distributed storage shares. While it is the case that data stored on such machines is technically secure from even a very large crypto brute forcing, it is also the case that people are still terrible at choosing strong passwords. Presumably, a backup service would like to offer users a password recovery mechanism, in the event of a lost decryption key - that way, their old backups would still be accessible. The host of such a distributed storage system would also have access to relatively large computing power (as would be the case with Amazon’s EC2 here). As such, the user may as well store their information as cleartext, as it becomes relatively trivial for this hosting provider to break a user’s encryption using their recovery password. This is a limitation for any backup provider that I have yet to see solved; so far, the only answers seem to be “use a password and risk being broken” or “don’t use a password and risk losing backups”. I would like to see more work that attempts to rectify this issue.

SPORC is a system designed with security as a primary concern, rather than what seems to be a secondary concern in Cumulus. SPORC allows a client to collaborate with other clients on arbitrary computation tasks, the privacy and security of which are guaranteed to a degree by the use of crypto keys. It provides a shared file space for users in collaboration, it has mechanisms by which to detect and recover from tampering or other malicious behavior, and it tolerates version conflicts and unstable network connections.

Rather than casually assume some degree of trust with the remote system and links between it, the team here takes a typically crypto approach and assumes that everything is malicious apart from the users themselves. By this, they apply crypto schemes to all data moving in and out of the shared space, as well as to the metadata of what is currently stored in the shared space. This of course implies a greater crypto overhead to use of this system, which the implementers leave to the client side (because the server cannot be trusted). Their micro-benchmarks show a promising trend of resource overhead stability for small numbers of users, with only a text editing task showing relatively higher overhead. 

This approach is suitable for live collaboration, as locally stored copies on each user’s machine would suffice in the event of a network outage; the system is designed specifically to handle this, even. Where this system has issues, however, is in scalability: propagating changes to a small number of users (their experiments used 16 clients) is simple and incurs a low overhead relative to file size. By contrast, propagating those same changes to, say, tens of thousands of users becomes much more costly, as each user must expend resources to decrypt what comes out of the shared server space. A use case with such a number of users is not clear at this moment - perhaps a corporate system that outsources its digital collaboration spaces across multiple physical offices. However if this scheme were to be transplanted to a larger user base, the suitability of this system becomes unclear.From:	w SpamElide on behalf of Will Dietz [wdietz2 SpamElide]
Sent:	Thursday, March 03, 2011 4:15 AM
To:	Gupta, Indranil
Subject:	CS525 Review 03/03

Will Dietz
cs525
3-3-2011

"Cumulus: Filesystem Backup to the Cloud"

This paper presents 'Cumulus' which is both an idea and an
implementation of using a 'thin-cloud' for filesystem backup.  They
built a system that uses a very minimal storage API (put, get, delete,
list) to implement a filesystem backup without requiring much
server-side help.  The key benefit to this 'thin-cloud' approach (even
if using a provider that supports other features) is that it doesn't
tie you to particular provider, which when combined with the cost
analysis the authors did makes a compelling case for an open market of
competing providers.  Paying someone else to manage the storage (which
is managed for you, comparably cheap, and located remotely, all good
things for backups) makes sense when using their model, but also the
API they choose ensures that what you pay for (dumb storage, mostly)
is what gives you the benefit, not the logic involved in the backup
itself.  Of course a custom-built system end-to-end has benefits (as
they note in the paper), but in terms of what normal users can make
use of in today's offerings, Cumulus seems rather compelling.
  They use this simple abstraction to back up files, including
features such as combining small files, and incremental updates.  It
also supports encryption and compression through use of external tools
such as gzip and gpg.
  My favorite part of the paper was the question they pose at the end:
"Can one build a competitive product economy around a cloud of
abstract commodity resources, or do underlying technical reasons
ultimately favor an integrated service-oriented infrastructure?"

Pros:
  + Thin-cloud reduces lock-in
  + Cheap storage for many use-cases
  + Simple, straight-forward design

Cons:
  - Custom-built solution might fare better
  - Long-term data retention seems to scale poorly

-----------------------------------
"Towards Robust Distributed Systems"

This set of slides/presentation takes a more applied perspective on
distributed systems than we generally find in the literature.  It
starts out by observing that many key distributed systems that we all
rely on are hardly 'classic' distributed systems, and launches from
this to explore what makes a 'working' (read: robust) DS.  The most
interesting parts were the discussion on
consistency/availability/partitions (pick 2), and the discussion of
'boundaries'.  The C/A/P description drives-home the trade-offs one
makes when building a system.  The boundary discussion highlight what
works and what doesn't--that while APIs seem useful for programmers
(given their procedure-like semantics), it's really the protocols that
are more effective (as an example).  Regardless the big take away from
the boundary section is the observation that we often define
boundaries poorly and result in 'fragile' systems--trying to hard to
provide transparency, for example, causes potentially more trouble
than it's worth.

~Will
From:	david.m.lundgren SpamElide on behalf of David Lundgren [lundgre4 SpamElide]
Sent:	Thursday, March 03, 2011 2:49 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

SPORC: Group Collaboration using Untrusted Cloud Resources

Feldman et al. present their cloud framework, SPORC, for collaborative
applications and cloud-based services using untrusted servers. SPORC's
guiding principle is the relation between operational transformation
and fork* consistency. Operational transformation defines both
operators and their respective transforms. Local data can be modified
by a series of operators and then morphed to a consistent state by the
repeated application of transformed remote operations. The untrusted
server is then only delegated the task of properly ordering such
operations. To prevent a malicious server reordering data, the authors
stipulate fork* consistency such that a client's view of the operation
history is encoded within each operation sent to the server. This
allows clients to detect a malicious server if ever sent information
about another client. The remainder of the paper details SPORC's
threat model, system design, membership management, conflict
resolution, and fork recovery algorithms.

Pros:
1. The paper presents an inspired amalgam of public-key crypto and
distributed systems concurrency techniques to address their threat
model.
Cons:
1. The authors did not evaluate the time taken to recover from a fork.
The burden of detecting malicious server actions is shifted to
out-of-band client communication. This assumes that clients are indeed
able to communicate securely. It would be interesting to attempt to
shift this burden back onto the server by incorporating multiple
untrusted servers into the system.

----------------------------------------------------------------------

Cumulus: Filesystem Backup to the Cloud

Cumulus is a "thin cloud" system for data backup to the cloud. The
authors pose a minimalist RESTful-like interface with the atomic
operations put, get, delete, and list. Filesystem backup logic is
entirely shifted to the clients' machines. Cumulus is a write-once
storage model where system snapshots are composed of segments (which
are themselves the aggregate of groups of smaller files). This is done
to reduce bandwidth costs, enhance the security afforded by
encryption, and to allow greater file compression. System snapshots
are made up of a metadata headers and the data payload (both of which
are bundled into the aforementioned segments). Snapshots are stored
with pointers to their respective segments. To reduce excessive data
usage, novel snapshots include new/modified segments and pointers to
extant, unmodified segments. Garbage or segment cleaning is introduced
to reclaim space when snapshots are retired. Vrable et al.
differentiate between two different segment cleaning schemes: in-place
cleaning and no-segment-modification cleaning. The system is
empirically evaluated and is shown to perform well.

Pros:
1. Time to recover is empirically evaluated, and is (reasonably)
rapid.
2. The cost to consumers based on Amazon's S3 pricing model seems
extremely competitive compared to Dropbox and other consumer storage
services.

Cons:
1. The authors mention designing heuristics to "group data by expected
lifetime when a backup is first written in an attempt to optimize
segment data for later cleaning," as part of their future work. I believe
any heuristics for such predictive data analysis would have to be performed
at the client side and would unnecessarily increase the client's overhead.
2. By shifting all application logic to the client side, system
overhead is shifted to the client that could potentially result in
dramatic variance among client devices (e.g. how does Cumulus perform
on a mobile device vs. my PC).
3. Designed more for full system backups rather than for tracking
individual files.
From:	lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide]
Sent:	Thursday, March 03, 2011 12:52 AM
To:	indy SpamElide
Subject:	525 Review 03/03

CS 525 - Review: 03/03 Storage II

Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009

This paper first identified the trade-off between thin-cloud and thick-cloud. Most of the current backup services provide integrated solutions that are application-specific in order to focus on optimize storage capacity and bandwidth efficiency. The paper suggested the design choice on the other end of the spectrum – a generic, light-weighted and highly portable solution that can be run on any cloud-based storage service (so called thin-cloud). Their system, Cumulus, sacrifices performance for portability and provides less functionality than those in thick-cloud. However, the paper argued that by maintaining good enough performance, the portability is more desirable, because Cumulus (the prototype running on Amazon S3)is much cheaper and backing up on multiple providers reduces business risk.

The first and major contribution of the paper is to identify the difference preference over cloud-computing. For general public or small business, the performance (bandwidth and space capacity in the case of storage) is not always the most important concern. Since cloud is a fairly young business, no one knows the single right answer for the service and business model. As the paper argued in the end, their proposal about portable solution might be a desirable alternative solution to integrated service-oriented infrastructure. As for the architecture, it seems to me that the paper does not have any novel elements, and Cumulus just picks up and puts together many developed concepts and techniques. However, this might be one of its advantages, since simple and effective implementation of the storage server is always a plus. The final contribution is the experiment and the prototype evaluation. 

Comments/questions/critiques:

To provide a very high portability, the paper adopted a write-once storage model. As a result, Cumulus provides a very limited interface that only supports four operations. Even file attributes setting cannot be changed or fetched, which seems a very frustrating design from the user’s point of view, since people might want to share backup files or check the version in order to do some sort of synchronization of the data storage. In this sense, would there be many consumers that want to use this portable service?

Metadata is stored in a text format to provide transparent feature, for which I do not understand its importance. Moreover, on the downside of using this plain text scheme, the privacy or security is compromised. Since the metadata contains the cryptographic hash, the malicious people may easily pollute the files by dictionary attack.

The paper identified segment cleaning as important parameter (actually, there are segment size, and the threshold). However, how about the dynamic schedule of generating snapshots? In the paper, it seems only periodic snapshots were taken (weekly or daily). Would an algorithm that dynamically generates snapshot heuristically based on the number of files changed and the time passed since last snapshot or other choices (like user preferences) helps increase efficiency?

SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010

The paper dealt with the issue of malicious server and proposed a system, SPORC, to support generic collaboration service where fork* consistency in the existence of malicious server is guaranteed among clients. Moreover, clients can dynamically control the access permissions (dynamic membership) and the data is confidential from the server and unauthorized clients at any point of time. Finally, if server’s misbehavior is detected, clients can automatically switch to a new server and recover the previous good data.

The idea is similar to practical Byzantine fault-tolerant (PBFT) systems in the sense that server is always not trusted, and its only two functionalities are assigning and redistributing the execution order to the clients. However, SPORC does not need the usual threshold value of non-faulty server in PBFT due to the two main elements, Operational transformation (OT) and Fork* consistency. The paper’s most important contribution is to seamlessly integrate these two techniques. OT is used to resolve conflicts. Fork* consistency and usage of chain of hashed history ensure that SPORC can beat the threshold of PBFT (, since in general, PBFT requires strong consistency) and provide much more efficiency.

Comments/questions/critiques:

The system relies too much on the client, which might be a big disadvantage. First, it might be tedious to authorize large amount of clients. Second, client’s side might be much easier to be compromised, especially for those have not much knowledge about computer security and when there is a huge number of clients want to access the same piece of applications (this is the reason the paper was against peer-to-peer design). Since SPORC cannot tolerate any client’s faulty behavior, I am not convinced that this is a very practical system.

If the number of clients is moderate (, in which case, all of them would be more likely all nonfaulty), would SPORC necessarily outperform other peer-to-peer design or the naïve design (flooding everything)? It would be interesting to see the comparison, which is not performed in the paper. In the evaluation of key-value store, only 16 clients are involved. SPORC might not be the best solution here, since there are much overhead (hashed history). Why bother tolerating untrusted server? Just let clients that always trusted each other (assumption of the paper) flood everything.

From:	lewis.tseng.taiwan.uiuc SpamElide on behalf of Lewis Tseng [ltseng3 SpamElide]
Sent:	Thursday, March 03, 2011 12:51 AM
To:	indy SpamElide
Subject:	525 review 03/3

CS 525 - Review: 03/03 Storage II

Cumulus: Filesystem Backup to the Cloud, M. Vrable et al, FAST 2009

This paper first identified the trade-off between thin-cloud and thick-cloud. Most of the current backup services provide integrated solutions that are application-specific in order to focus on optimize storage capacity and bandwidth efficiency. The paper suggested the design choice on the other end of the spectrum – a generic, light-weighted and highly portable solution that can be run on any cloud-based storage service (so called thin-cloud). Their system, Cumulus, sacrifices performance for portability and provides less functionality than those in thick-cloud. However, the paper argued that by maintaining good enough performance, the portability is more desirable, because Cumulus (the prototype running on Amazon S3)is much cheaper and backing up on multiple providers reduces business risk.

The first and major contribution of the paper is to identify the difference preference over cloud-computing. For general public or small business, the performance (bandwidth and space capacity in the case of storage) is not always the most important concern. Since cloud is a fairly young business, no one knows the single right answer for the service and business model. As the paper argued in the end, their proposal about portable solution might be a desirable alternative solution to integrated service-oriented infrastructure. As for the architecture, it seems to me that the paper does not have any novel elements, and Cumulus just picks up and puts together many developed concepts and techniques. However, this might be one of its advantages, since simple and effective implementation of the storage server is always a plus. The final contribution is the experiment and the prototype evaluation. 

Comments/questions/critiques:

To provide a very high portability, the paper adopted a write-once storage model. As a result, Cumulus provides a very limited interface that only supports four operations. Even file attributes setting cannot be changed or fetched, which seems a very frustrating design from the user’s point of view, since people might want to share backup files or check the version in order to do some sort of synchronization of the data storage. In this sense, would there be many consumers that want to use this portable service?

Metadata is stored in a text format to provide transparent feature, for which I do not understand its importance. Moreover, on the downside of using this plain text scheme, the privacy or security is compromised. Since the metadata contains the cryptographic hash, the malicious people may easily pollute the files by dictionary attack.

The paper identified segment cleaning as important parameter (actually, there are segment size, and the threshold). However, how about the dynamic schedule of generating snapshots? In the paper, it seems only periodic snapshots were taken (weekly or daily). Would an algorithm that dynamically generates snapshot heuristically based on the number of files changed and the time passed since last snapshot or other choices (like user preferences) helps increase efficiency?

SPORC: Group Collaboration using Untrusted Cloud Resources, A. J. Feldman et al, OSDI 2010

The paper dealt with the issue of malicious server and proposed a system, SPORC, to support generic collaboration service where fork* consistency in the existence of malicious server is guaranteed among clients. Moreover, clients can dynamically control the access permissions (dynamic membership) and the data is confidential from the server and unauthorized clients at any point of time. Finally, if server’s misbehavior is detected, clients can automatically switch to a new server and recover the previous good data.

The idea is similar to practical Byzantine fault-tolerant (PBFT) systems in the sense that server is always not trusted, and its only two functionalities are assigning and redistributing the execution order to the clients. However, SPORC does not need the usual threshold value of non-faulty server in PBFT due to the two main elements, Operational transformation (OT) and Fork* consistency. The paper’s most important contribution is to seamlessly integrate these two techniques. OT is used to resolve conflicts. Fork* consistency and usage of chain of hashed history ensure that SPORC can beat the threshold of PBFT (, since in general, PBFT requires strong consistency) and provide much more efficiency.

Comments/questions/critiques:

The system relies too much on the client, which might be a big disadvantage. First, it might be tedious to authorize large amount of clients. Second, client’s side might be much easier to be compromised, especially for those have not much knowledge about computer security and when there is a huge number of clients want to access the same piece of applications (this is the reason the paper was against peer-to-peer design). Since SPORC cannot tolerate any client’s faulty behavior, I am not convinced that this is a very practical system.

If the number of clients is moderate (, in which case, all of them would be more likely all nonfaulty), would SPORC necessarily outperform other peer-to-peer design or the naïve design (flooding everything)? It would be interesting to see the comparison, which is not performed in the paper. In the evaluation of key-value store, only 16 clients are involved. SPORC might not be the best solution here, since there are much overhead (hashed history). Why bother tolerating untrusted server? Just let clients that always trusted each other (assumption of the paper) flood everything.

From:	Qingxi Li [cs.qingxi.li SpamElide]
Sent:	Wednesday, March 02, 2011 11:34 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

Cumulus: file system backup to the cloud

This paper introduces Cumulus which is designed to back up the system on the cloud. Cumulus minimizes the storage cost of the cloud and the bandwidth cost of the Internet.  Cumulus requires the could only provides four operations for the whole file which is get, put, list and delete. This backup software based on segment which is made up by many small files. The reason why Cumulus uses segments is because smaller files are inefficiency and without pipeline, transmitting them to the server will cost much more than the segments. The other reasons are compress segment will be more efficient and group files can hide the size of the files. However, the disadvantage of segments is when restoring from backup, we should download all the segments which include the file we want.  This may be much larger than the original size of data. The other disadvantage is that it will make is more possible for a segment to be cleaned as garbage. If we use file, when all the snapshot didn’t connect this file, it can be cleaned. However for segments, we should wait until all the files in it didn’t have connection with any snapshot, then it can be cleaned. Cumulus snapshots consist of a metadata log and life data. Both of them are separated into blocks and these blocks packed together to make segments. Metadata log list fall files and for each file, it includes the properties like modification time, ownership and a cryptographic hash to verify the integrity of the file. Besides this, it also includes many pointers to the blocks of this file. Metadata is in a text. Metadata is pointed by a snapshot descriptor which includes the segments connected by this snapshot. For garbage collection, when Cumulus deletes backup file, it only deletes the snapshot descriptor and initial the garbage collection process. This process will check all the segments to find out the segment without connection and delete them. For the segments only a few files be connected by some snapshot, it will collect all these files and make a new segment. For the new snapshot, it will connect the new segment and after all the connection to the old one being deleted, the old segment can be deleted. This will waste some space and increase the number of segments needs to be checked in garbage collection. The other problem is that the author argues that current backup software locking customers into a particular provider and Cumulus will not. However, Cumulus also requires the cloud only provide the four operations, get, put, list and delete, for the whole files.  This request also limits this software to some special cloud as some of the clouds also provide other methods to change the content of the file. For Cumulus, if the could provides some methods to change the files, it may have some security problems. For example, as the metadata is in text, if the cloud provides method to change file, this file may be changed by some malicious nodes which hack into the cloud. 

 
Towards Robust Distributed Systems

This PPT is interesting. It mentions some problems we should care about when we want to design a new distributed system. It also gives some data of the real data centers. I was impressed that the crashes and disk failures happen weekly which are so frequently. And there are also several times of power outage which I just think it happens fewer than it. The other thing the presenter mentioned is the tradeoff between ACID which is also be mentioned in the distributed system of Amazon. Besides this, the author gives us some suggestions about the interface design. I find some of them are very useful, like we should consider whether the other side can be trusted and the partial failure problems. Besides this, the system had better support the partially evolution. 

From:	mark overholt [overholt.mark SpamElide]
Sent:	Wednesday, March 02, 2011 6:11 PM
To:	Gupta, Indranil
Subject:	525 review 03/03

Mark Overholt

CS525 03/03/2011


Cumulus: Filesystem Backup to the Cloud

Summary:

Cumulus is a cloud based backup storage system. It relies on a thin cloud assumption. That is to say, the cloud needs only to have put, get, list, and delete operations for its files. Also, Cumulus works on the granularity of whole files. It does not break up files into chunks, rather it maintains entire files. Since the interface doesn't provide a modify functionality, the way to modify it is by deleting it and storing the new version. Cumulus revolves around the idea of snapshots. Each snapshot logically contains two parts: the metadata log that lists all the files backed up, and the file data itself. Both parts are broken into blocks or objects, which are numbered sequentially and packed together into segments, which have unique names. The stored files can be rebuilt using this snapshot in case of a failure. Cumulus is optimized for a complete restore. A partial restore is inefficient and requires fetching of a number of segments. When segments from old snapshots are no longer needed, the segments are cleaned when no more new segments point to or depend on such segments. In order to facilitate such cleaning, Cumulus may have old and new snapshots refer to different copies of same data, thus, storing redundant information. A working prototype is built using Amazons S3 (Simple Storage Service). In implementing prototype, the authors make a number of design decisions to maintain a balance in the tradeoffs involved between efficiency and resultant storage costs. 

Discussion:

Pros:

The backup service supports modification of backed up files. Although this is done via storing redundant information, supporting such a feature enables the usage of the service in a range of applications.

Cumulus does not modify the files in place and hence, provides failure guarantees against snapshot corruption

Only requires a minimal interface from the cloud, so can easily use any cloud storage provider.

Cons:

There is no inbuilt compression or encryption built into the system. There is a heavy reliance on external tools, which will evidently not adaptive to any requirements of a particular system. Also, a vulnerability of the external tool will result in a vulnerability of the backup system too.

Segment maintenance is disorganized and new segments may point to parts of old segments that could be otherwise cleaned up. This introduces a lot of redundancy and wastes storage, a disadvantage that is not tolerable in a storage system where clients pay on the bases of their usage.


SPORC: Group Collaboration using Untrusted Cloud Resources

Summary:

SPORC is a cloud based system for collaborative applications using un-trusted servers. SPORC challenges the belief that applications must sacrifice strong security and privacy of data to enjoy the benefits of cloud deployment. SPORC’s cloud servers see only encrypted data, and clients will detect any deviation from correct operations. The users of SPORC put their trust in their own cryptographic keys, and not in the cloud provider’s good intentions. SPORC is a flexible framework that could be used for a broad range of collaborative services. It claims it can propagate modifications quickly amongst many users, that it can tolerate slow or disconnected networks during the updates, and that it can keep data confidential from the server and unauthorized users. If there is a server that is misbehaving, it can detect that servers actions and recover from the attack. At a high level, a SPORC application is synchronized between multiple clients, using a server to collect updates from clients, order them, then redistribute the client updates to others. 

Discussion:

Pros:

I like how they thought of many different types of security threats, instead of just focusing on one, and building a system around that, leaving other potential holes. 

Encrypting the data transferred in and out of the server did not seem to have much effect on latency. 

Cons:

SPORC still has a slightly centralized portion to it. While it is assumed that central server is un-trusted, and it takes action against that….it could still be a bottleneck as well as a target of DDOS attacks.

From:	Tony Huang [tonyh1986 SpamElide]
Sent:	Wednesday, March 02, 2011 2:39 AM
To:	Gupta, Indranil
Subject:	525 review 03/03

Zu C Huang
zuhuang1

* Cumulus: Filesystem Backup to the Cloud

Core Idea:
In this paper, the authors introducs a thin-cloud system to back up files in a cluster. The system interface is minimal, which only consists of 4 basic operations: get put, list and delete. Business logics are implemented wholly in the client side. The system is write-once storage model, which means files are never modified once it is stored until its deletion. Internally, cumulus aggregates smaller files into large units called segments. Files are broken into blocks, or objects in the paper, and packed into segments. The meta-data of a snapshot contains pointers to the latest objects. Retrieving a snapshot can be achieved by walking the DAG of objects rooted at the snapshot metadata node. The system only backs up the small portion of file that is modified. Unneeded blocks are automatically garbage collected by the system. The system provides a parameter to let the user decide the percentage of unused space at which the system would try to clean a segment and free up space.

Pros:
* A simple system interface.
* Efficient usage of system by breaking files into blocks and only back up changed blocks.
* Easy for complete restoration of large size file.
* Optional encryption scheme is effective and practically useful.

Cons and points missing from the paper:
* The system is single-purposed.
* The system did not provide a throughput test.
* The paper started with a discussion of thin-cloud and thick-cloud trade off, but it never returns to fully discssing this trade off. What exactly does the author mean by thin-cloud and thick-cloud is never clear in the paper. 
* I think the architecture described in the paper is general and it's not just for cloud system. It does not provide the normal cloud services such as replication or fault tolerance. I would like to see a more in-depth discussion of what functionality of the cloud is being leverged in the work. 


* SPORC: Group Callaboration using Untrusted Cloud Resources

Core Idea:
In this paper, the authors present a system using untrusted cloud-base servers implement collaboration service. It utilizes techniques from operational transformation and fork* consistency protocols. In short, the operational transformation models changes to the shared objects as series of transformations. Various transformations made by different nodes in the system could be merged by pre-define transformation functions. Fork* consistency is that, if the order of two operations are exchanged, it would be detected by constructing a hash-chain and compare its value to the correct hash value. 

Pros:
* Use mature techniques to implement what the authors want.


Cons:
* Operational transformation, while being clean in theory, can get very messy for any meaningful applications besides the most trival ones because of the complexities of most real-world applications.
* The authors does not addresses the network connection issues and its entailing random ordering problems. 
* The author assumes that when a client detects a malicious server, it can use another one. This assumption may be violated in real life and introuce complexities that the authors did not address. If we are using more than one server to serve a file, how to mitigate differences between different servers? How to handle byzantine servers? If one user identify a malicious server, how should it contact other users to switch? All these real world complexity issue is never addressed in the paper.
* The design places too much computation requirement on end-users since it has to do all the encoding and decoding and transformation. It may be very power consuming for the clients described in the paper such as mobile phones.

This paper presents some interesting ideas on how to design distributed collaboration system. But this paper has a lot of real-world considerations that should be addressed to be a useful infrastructure. 

-- 
Regards
-- Tony