From: Shivaram V [shivaram.smtp@gmail.com] on behalf of Shivaram Venkataraman [venkata4@illinois.edu] Sent: Tuesday, April 13, 2010 12:26 PM To: Gupta, Indranil Subject: CS 525 review 04/13 Shivaram Venkataraman - 13 April 2010 The Chubby lock service for loosely-coupled distributed systems This paper presents Chubby, a coarse grained and distributed lock service for use in large scale distributed systems. Chubby allows other distributed systems like GFS and Bigtable to synchronize their activities like leader election and discovery of master replica etc. The distributed consensus problem is addressed by using Paxos where clocks are introduced to overcome the Impossibility of consensus in asynchronous systems. The Major design highlights of Chubby are: Coarse Grained Locking: Chubby supports coarse grained locks which are usually held for longer periods of times like hours or days. This permits the clients of the lockservice to survive lock server failures and reduces the amount of state that needs to be maintained in Chubby. UNIX like file system: Locks are implemented in Chubby using a UNIX like filesystem which is maintained consistently across clients. In addition to providing support for locks, the files can be used to disseminate smaller configuration data. Sequencers and Events: Chubby is designed for use cases where locks are passed from one system to another in a large cluster and where there are a large number of readers which act on any changes in the locked file. To support this, Chubby implements sequencers which are strings that can be passed to other servers and can be used to check if the lock is still alive. Also clients can subscribe to notifications (like a pub-sub server) and act on any changes in the file metadata and data. Pros - Designed and implemented to work on large clusters within Google. Presents real world information about the distribution of files that were used and failovers encountered. Cons: - This paper does not present any evaluation of the system and only describes the experiences on how the system was used. - It would be interesting to know what is the maximum size of file that can be maintained in Chubby and what are the limiting factors (Paxos vs DB write throughput etc) Interesting points: - The authors chose to write their own simple database in-place of Berkeley DB. - Chubby has been used more popularly as a name service rather than for locks. From: Rini Kaushik [rinikaushik@yahoo.com] Sent: Tuesday, April 13, 2010 12:24 PM To: indy@cs.uiuc.edu Subject: CS525 review 04/13 CS 525 Rini Kaushik MemCached Memcached is a high-performance, distributed caching system. Although application-neutral, it's most commonly used to speed up dynamic Web applications by alleviating database load. Pros: 1) Open-source tools like memcached allow other members of the world-wide software community to leverage the code and foster fast code development. 2) Memcached can be deployed in a clustered fashion to any number of machines and hence, it is not constrained in size unlike the typical caches. This property makes Memcached very scalable. 3) Hardware caches such as L1 and L2 have to consider several conflicting tradeoffs such as lookup time, miss rate etc. in deciding the size of the cache. They typically have diminishing returns as the size is increased beyond a certain amount. However, memcached is able to get past these issues. 4) It is very fast and efficient and performs well under heavy load also. It uses non-blocking networking libraries which allow asynchronous IO. Cons: 1) Memcached doesn't support access control and hence, is not secure. While it is possible to use encryption as an option in a shared environment, encryption does add substantial overhead and hence, is not always viable. Other option suggested on the memcached website is to use obscure names for the keys. This scheme does have a caveat that the key names won't be as intuitive for the users and they would need to maintain a good mapping of the keynames. 2) Memcached only supports single value cache. Since, range queries are so common in databases, it would be good to support multiple values. In addition, it only supports small sized results which further limits the applicability. 3) It is not clear as to how is the cache kept synchronized. Let's say I cache the count of rows in a certain table. How will this number be kept accurate across multiple inserts and deletes to the underlying table? It seems to me that the responsibility lies on the programmer who is using memcached to take care of purging and updating the memcache to handle cache coherency. 4) It is not magically scalable and does need changes in the networking stack etc. as illustrated in the Facebook example to really scale. 5) It doesn't provide redundancy and fail over. All such complexities need to be handled by the applications using memcached. Discussion: 1) It is not clear to me as to how is memcached handling data reliability and integrity? What happens if data is corrupted? Or if a machine which holds a particular key starts misbehaving? From: pooja.agarwal.mit@gmail.com on behalf of pooja agarwal [pagarwl@illinois.edu] Sent: Tuesday, April 13, 2010 12:06 PM To: Indranil Gupta Subject: 525 review 04/13 DS REVIEW 04/13 By: Pooja Agarwal Paper – Dynamo: Amazon’s Highly Available Key-value Store Conference – SOSP 2007 Main Idea: Dynamo is a scalable storage system which provides high availability against consistency. Dynamo is being currently used many of the Amazon’s services. The idea of the paper is how to efficiently employ distributed systems ideas currently available in literature and design a production system out of it. The paper provides key insight into how the basic ideas are enhanced to make them work in the real world large distributed systems. In this paper, authors use techniques like consistent hashing, vector clocks and quorum to achieve higher availability. For consistency, Dynamo checks the conflicts in replicas at read time rather than at the write time. In further detail, Dynamo eliminates the use of relational database and uses only primary key based database operations. An object in Dynamo is stored as (key, object) pair and consistent hashing with replication on virtual nodes is used to store the data on various nodes. Vector clocks are used to designate causality between two versions of same object and hence help in collation of the previous versions of the object. Dyamo uses sloppy quorum to assign a data object to first N nodes in the ring and is relaxed than the general quorum technique used in literature. Sloppy quorum is used to allow for higher availability in face of failures. In permanent failures, Dynamo uses anti-entropy provided by Merkle Hash trees for faster finding of the inconsistencies between two replicas. Pros: 1) Achieves high availability, 99.9th percentile of the requests. System tries to optimize itself for all requests rather than maximum requests. 2) Elimination of redundant relational database requirements based to efficiently capture just the need of the system. 3) Load balancing and availability is maintained during node failures by use of virtual nodes idea with consistent hashing. 4) The paper also discusses three load balancing strategies and evaluates them against each other. Cons: 1) Multiple versions of objects are allowed to stay in the system which can lead to increased work during collation and some of the versions may not be successfully collated due to conflicts. 2) To achieve full membership model, Dynamo uses gossiping between neighbors to know data hosted at each node which is not scalable to large systems as the routing table size increases linearly with the number of nodes. With Regards, Pooja From: Jayanta Mukherjee [mukherj4@illinois.edu] Sent: Tuesday, April 13, 2010 10:35 AM To: Gupta, Indranil Subject: 525 review 04/13 More Industrial Systems Jayanta Mukherjee NetID:mukherj4 The Chubby lock service for loosely-coupled distributed systems by M. Burrows: Chubby is a distributed lock service intended for coarse-grained synchronization of activities within Google’s distributed systems. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability As mentioned by the author that, Chubby has become Google’s primary internal name service; it is a common rendezvous mechanism for systems such as MapReduce ; the storage systems GFS and Bigtable use Chubby to elect a primary from redundant replicas; and it is a standard repository for files that require high availability, such as access control lists. The purpose of the lock service is to allow its clients to synchronize their activities and to agree on basic information about their environment. It uses Paxos protocol to solve the asynchronous consensus problem. As mentioned in the paper, there are two major design decisions that they took while developing this: 1.They choose a lock service, as opposed to a library or service for consensus, 2.They choose to serve small-files to permit elected primaries to advertise themselves and their parameters, rather than build and maintain a second service. Pros: 1.Chubby provides a coarse grained locking service and a reliable storage for loosely coupled system. 2.It achieves fault-tolerance by having distributed consensus among a few replicas. 3.Chubby uses consistent client-side caching to reduce server load while retaining simple semantics. It provides timely notification of the updates. 4.Chubby provides reliability, availability to a moderately large set of clients, with easy-to-understand semantics. 5.The lock based interface as provided by Chubby is a more familiar thing for the programmers to use and deploy. Cons: 1.Chubby is not a high-performance solution. Also, chubby does not scale very well. 2.It is an engineering effort, so, new science (or algorithm) has been developed to implement Chubby. 3.The paper is written putting more emphasis on how they have implemented it to facilitate programming rather than throwing a new concept into it. Comments: Some of the claims made in this paper regarding scalability and performance and the engineering approach depicts the limitation or shortcoming of the approach. So, may be a better locking protocol or strategy may solve the scalability issues rather than just improve over the partitioning. Dynamo: Amazon’s Highly Available Key-value Store, by DeCandia et al,: Dynamo is a highly reliable (available), and scalable distributed data store built for Amazon’s platform. It uses a synthesis of well known techniques to achieve scalability and availability: Data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning . Dynamo relies on a gossip based distributed failure detection and membership protocol. Some of the interesting features of Dynamo are as follows: 1.Dynamo is targeted mainly at applications that need an “always writable” data store where no updates are rejected due to failures or concurrent writes. 2. Dynamo is built for an infrastructure within a single administrative domain where all nodes are assumed to be trusted. 3.The applications that use Dynamo do not require support for hierarchical namespaces (a norm in many file systems) or complex relational schema (supported by traditional databases). 4.Dynamo is built for latency sensitive applications that require at least 99.9% of read and write operations to be performed within a few hundred milliseconds. Pros: 1.Dynamo is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from Dynamo without requiring any manual intervention. 2.Dynamo provides “always-on” experience by assuring better reliability and availability. The system is designed to meet the Service Level Agreements (SLA). It has been successful in handling server failures, data center failures and network partitions. 3.It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use. 4.As mentioned by the authors, dynamo scale to extreme peak loads efficiently without any downtime during the busy shopping seasons. 5.Dynamo is configured in such a way that the services must be able to configure Dynamo such that they consistently achieve their latency and throughput requirements. 6.Dynamo uses very smart strategy in determining when to perform the process of resolving update conflicts. 7.Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request. 8.Dynamo allows service owners to customize their storage system to meet their desired performance, durability and consistency SLAs by allowing them to tune the parameters N, R and W. Cons: 1.Dynamo sacrifices consistency under certain failure conditions. 2.As being used for Amazon's internal purpose, the security features of Dynamo is not very robust. There is no security related requirements, like, authentication, authorization. Comment: Dynamo has been developed by combining different techniques to provide a highly-available system. Some of the details about the underlying algorithms is not present in the paper. -With regards, Jayanta Mukherjee Department of Computer Science University of Illinois, Urbana-Champaign Urbana-61801, IL, USA Mobile:+1-217-778-6650 From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Tuesday, April 13, 2010 9:53 AM To: Gupta, Indranil Subject: 525 review 04/13 CS525 reviewed on More Industrial Systems Liangliang Cao (cao4@illinois.edu) April 8, 2010 1. DeCandia et al,. Dynamo: Amazon's highly-available key-value store, SOSP 2007, This paper presents the design and implementation of Dynamo, a highly available key-value storage system in Amazon.com platform, where the infrastructure is composed of tens of thousands of servers and network components located in many datacenters around the world. In this scenario, the scalability and availability are both of primary concern. Dynamo uses a replicated DHT with consistency management, which leads to a decentralized system with minimal need for manual administration. Storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution. Pros • The authors realize that RDBMS is a clumsy for real-like storage large scale system, and design a specialized data score with lightweight function but effective for high availability and good user-perceived scalability. • Dynamo effectively uses a synthesis of well known techniques: data partitioned and replicated using consistent hashing, and consistency via object versioning, and quorum-like consistency techniques during updating. • The DHT based infrastructure makes it possible for incremental scalability. This shows that the techniques of distributed system are useful to build a mission critical production system. • The 99.9th percentile latency is an impressive performance in large scale. Moreover, the system can tune the tradeoff between consistency and availability. Cons • Dynamo is not fit for storing large object (>1M) • As a lightweight storage system, it provides limited interface compared general RDBMS, where the traversal and streaming operation are not supported. • Developing Dynamo requires extensive engineering efforts. Open-source counterpart: Voldemort Voldemort is another key-value storage system. Voldermort is motivated by Dynamo, but developed in opensource community. In Voldermort, Data is automatically replicated and partitioned over multiple servers, so that each server contains only a subset of the total data, server failure is handled transparently, and Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system. Voldermort is employed in LinkedIn. Compared with Dynamo, it enjoys some new feature: • Voldemort allows rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization. These features allow pluggable serialization. • Voldemort provides supports for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart. 2. Scaling memcached at Facebook Memcached is a high-performance, distributed memory object caching technique. In Facebook, the people to alleviate database load using memcached over more than 800 servers supplying over 28 terabytes of memory. The scalability issue lies in the facts that memcached uses a per-connection buffer to read and write data out over the network, which adds up to gigabytes of memory when there are hundreds of thousands of connections. To handle the scaling issues, Facebook made the following modifications: 1. To reclaim this memory for user data, Facebook implemented a per-thread shared connection buffer pool for TCP and UDP sockets, which help to reclaim multiple gigabytes of memory per server. 2. Facebook used separate UDP sockets for transmitting replies (with one of these reply sockets per thread). Under load on Linux, UDP performance was downright horrible due to considerable lock contention on the UDP socket lock when transmitting through a single socket from multiple threads. 3. Facebook introduced “opportunistic” polling of the network interfaces, which combines the interrupt driven and polling driven network IO. By doing network transmission on every core and since we poll for network IO from the scheduler’s idle loop, Facebook distribute network processing evenly across all cores. 4. Facebook also optimize a few bottlenecks found in multicore machines. First, memcached's stat collection relied on a global lock, which accounted for 20-30% of CPU usage in 8 cores machines. Facebook eliminated this bottleneck by moving stats collection per-thread and aggregating results on-demand. Second, to handle the decreased performance when increasing the number of threads transmitting UDP packets, Facebookchanged the dequeue algorithm to batch dequeues for transmit, drop the queue lock, and then transmit the batched packets, which amortizes the cost of the lock acquisition over many packets and reduces lock contention significantly. Pros: • With all these changes, Facebook scaled memcached to handle 200,000 UDP requests per second with an average latency of 173 microseconds, compared to the original 50,000 UDP requests/s using the stock version of Linux and memcached. Cons: • The modification of 2 and 4 are built on linux, which may be unstable in the long run after the linux kernel is changed. • The security issue is not considered in current implementation, which might be crucial for applications in Facebook. From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Tuesday, April 13, 2010 9:11 AM To: Gupta, Indranil Subject: 525 Review 04/13 Review of “The Chubby Lock Service for Loosely-Coupled Distributed Systems”: This paper presents the architectural and implementation issues encountered in designing a reliable lock service called Chubby which provides coarse-grain synchronization mechanisms to distributed applications deployed in Google. Chubby provides a lock-based interface to its users, which is a familiar platform for users. It allows users to store and fetch small files which can be used by the service to share information among its clients. It provides a coarse-grained locking mechanism since this lead to lower load on the locking service and typically, lock acquisition rate in the applications being considered is quite low as compared to the rates of the transactions of the clients. Each chubby client in a cell communicates with the Chubby server through the use of client library which provides an API and uses RPCs. Each Chubby cell consists of a set of replicas which function as servers and use a distributed consensus protocol for master election. Chubby further provides a file system which provides the notion of directories, files and ACLs similar to the Linux file system and the file operations are performed by the clients by obtaining handles. Each such file can be associated with an exclusive writer lock or a shared reader lock. Chubby does not enforce mandatory locks in order to allow clients which do not hold locks to access the files. Chubby uses an event-based mechanism to deal with various events like file modifications in order to save processing that would have been wasted if polling was used. The sessions between the clients are maintained by the use of Keepalives which ensures that the client’s state is valid only until it’s session is alive and helps deal with failures. The paper further presents the results of how Chubby is typically used in Google. Comments: - The paper presents a thorough examination of the difficulties faced with a practical lock service and how it is dealt with through the use of several mechanisms like leader election, keepalives, caching etc. Chubby is developed as a generic system that can be used by a variety of applications and exposes details of the underlying system to them through the use of necessary API. However, the paper doesn’t examine to what extent such information would be useful to an application and doesn’t discuss how much information should be exposed to it. In general, greater the information exposed by an underlying system, the application can perform better by using whatever is necessary to it. - The results indicate that Keepalives nearly 90% of the network traffic. While they are necessary to determine the state (failed/active) of a client, Chubby uses them quite widely and doesn’t discuss any optimizations which can reduce the rate at which they are sent. While the use of proxies decreases the load they create in the server, they do result in network traffic which can lead to congestion if the network is the bottleneck. Chubby should have had “knobs” to tune the rate at which these are sent, which can be exposed to the application and can be modified as required. - Chubby aims at providing a coarse-grained lock service. However, to increase the scope of its usage by various applications, it might have to provide finer-grained lock service. In general, granularity is to be defined the application and the lock service is supposed to adapt to it in a generic manner. From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Monday, April 12, 2010 11:59 PM To: Gupta, Indranil Subject: 525 review 04/13 Dynamo: Amazon's highly-available key-value store, DeCandia et al, SOSP 2007 This paper describes Dynamo, a key-value storage system developed and used internally at Amazon. Dynamo chooses the availability and performance legs of the CAP triangle, choosing to sacrifice consistency guarantees. In Dynamo, write requests are always allowed to succeed immediately, and if there are multiple parallel updates to the same key-value pair, all the conflicting versions are stored. Conflicts need to be resolved by the applications during reads (read requests return all the available data versions). The K-V store itself is maintained using relatively standard consistent hashing techniques, with the key distinguishing feature being that they ensure data access requests terminate at a single node (as opposed to, say Pastry, where a single request may end up being forwarded through many nodes) -- this is done to guarantee low latency. Comments: 1. It is interesting that even with their focus on optimizing write latency at the expense of reads, write requests are still the more time-consuming of the two. 2. They ignore security issues, arguing that Dynamo is designed to operate only in trusted environments. 3. While it is indeed interesting to see how they analyse their application requirements, identify the appropriate techniques for these requirements, and then put everything together, the novelty of the work seems questionable. From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Monday, April 12, 2010 10:04 PM To: Gupta, Indranil Subject: 525 review 04/13 ===================================================================== Dynamo: Amazon’s Highly Available Key-value Store (SOSP 2007) ============================================================================ Amazon, one of the largest e-commerce systems all over the world, maintains a huge infrastructure (i.e. hundreds of services, thousands of commodity servers, and tens of millions customers at peak times and so on). Hence, reliability is an extremely important issue for Amazon since even a small outage can contribute to a huge monetary loss. Amazon's best known storage technology is Amazon S3 (Simple Storage Service) which tries to ensure both reliability and scalability. In the paper titled "Dynamo: Amazon’s Highly Available Key-value Store", the authors (who are also part of Amazon), presented another highly available and scalable data storage system named Dynamo. Dynamo is a key-value storage system used for storing state of a number of core services of Amazon’s e-commerce platform. Dynamo achieves "Always-on" experience by sacrificing consistency under certain failure scenarios. Dynamo stores data in (key, object) fashion. Dynamo uses consistent hashing. Like Chord, in Dynamo, each node gets an ID from the space of keys and nodes are arranged in a ring. Data is stored on the first node available in the clockwise direction. In Chord-like scheme, nodes are placed randomly on ring which may lead to uneven data & load distribution. To address this issue, Dynamo uses “virtual nodes”, where each physical node has multiple virtual nodes and those are distributed across the ring. To improve the data availability, several replicas of each data object are maintained. Dynamo also maintains data versioning to achieve eventual consistency. Pros: 1. Unlike many other distributed system, Dynamo is deployed in real industry (Amazon), which surely shows its effectiveness. 2. Dynamo is an ideal example of distributed system where many popular existing technologies are combined together to build a large and effective system. Dynamo exploits DHT, consistent hashing, vector clock, quorum protocol, versioning, anti-entropy based recovery, and so on in an efficient way. 3. Incremental scalability is one of the key principles embraced in Dynamo design, which creates minimum impact both to system operator and system itself. 4. Dynamo provides eventual consistency, which makes the implementation easier. Cons/Discussion Points: 1. Dynamo is only applicable for web based application and all queries need to use HTTP framework. Hence Dynamo is not applicable for generalized storage system. 2. It would be interesting to know a detail comparison of Dynamo with similar systems, especially with BigTable. Bigtable is a distributed storage system for managing petabytes of data across thousands of commodity servers, and is mainly used by many projects at Google such as web indexing, Google Earth, and Google Finance, etc. 3. How does Dynamo achieve 0-DHT? From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Monday, April 12, 2010 9:34 PM To: Gupta, Indranil Subject: 525 review 04/13 Dynamo: Amazon's highly-available key-value store Amazon.com is probably one of the largest e-commerce services around the world and its operation and clientele are such that any outage, no matter how small can have serious financially disastrous consequences. Reliability, therefore, is a prime concern. The authors thus present ‘Dynamo’, which is a highly available key-value storage system which is aimed to provide an always-on experience to the users. The main idea is to sacrifice consistency to provide high availability. So instead of strong consistency, they choose eventual consistency. For failure detection, recovery and availability, they use a combination of many popular and well-understood techniques for dealing with distributed systems. They use consistent hashing to deal with partitioning, and to provide high availability for writes, they use vector clocks. Gossip based protocols are used for failure detection. They use a quorum based solution to deal with temporary failures and Anti-entropy with Merkle trees to deal with permanent failure. Their system seems promising in terms of scalability. They also achieve load balancing through the use of virtual nodes and if some node becomes unavailable, its load is distributed among other nodes and once it comes back, it accepts load from other nodes. They push the responsibility of conflict resolution to the application which further simplifies the design and avoids single points of failures. Conflict resolution is carried out during ‘reads’ rather than ‘writes’ giving the notion of ‘always-writable’. Although the paper presents a workable and practical system, the level of innovation and novelty however is fairly limited. But that’s probably typical of papers coming from outside of academia. From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Monday, April 12, 2010 7:14 PM To: Gupta, Indranil Subject: 525 review 04/13 525 review 04/13 Hyun Duk Kim (hkim277) * Dynamo: Amazon's highly-available key-value store, DeCandia et al, SOSP 2007 This paper presents Dynamo, Amazon's highly available key-value storage system. For the large e-commerce operations such as Amazon, even slight outage may cause significant financial consequences and customer trust issue. Dynamo is for providing highly available system. For the high availability it sacrifices some consistency. Dynamo handles various failure situations with extensive object version and application-assisted conflict resolution methods. According to experiment results, Dynamo satisfied performance requirement with high availability. This paper shows pretty complete exploration of various techniques for Dynamo. For each of many sub-components of Dynamo, authors explain in detail and also explain possible alternatives if there are. Thorough consideration of many cases makes Dynamo more complete system. However, this also made the system very complex. In some sense, Dynamo just handled each case arbitrarily without overall theory. Complexity may make maintenance and upgrade difficult. Anyway, despite of its complexity, Dynamo showed enough performance for Amazon system. It is good. Users can control system with parameters depending on the usage of application. As authors mentioned, requirements should be different depending on application. Whether read or write is prioritized can be efficiently changed by tuning parameter N, R and W. * Memcached These articles introduce memchched. Memcached is a memory cache that we can use in distributed environment. Caching helps to improve performance by letting users to access recent data faster. Memcached let users to use memory case in distributed system with simple structure. Users can just save and retrieve items in cache with keys. The second page is memcached project page and has many related articles. The first article introduces a basic concept of memcache and how to use it with pseudo code, and the third article shows practical example of how memcache was used in Facebook. Memcache provides very easy to use as well as effective mechanism. Users can access items in cache with key on any machines in distributed system. With this simple logic, memcache can save a lot of costs to access the same (or recent) data. If there is a formal report about memcache, it would be very helpful to decide whether to use memcache or not. There should be trade-off in using cache. Frequent same-data-reading operation surely can be improve by cache. However, because if not many reads are covered by cache, it may degrade performance because all the operation should go through cache. Therefore, we should use memcache when it can be helpful. We may general think the situation. However, still if there is a good benchmark experiment report, it would be useful for users. Using one type of key may be conflicted. Memcache use 'key' to lookup items in memory. However, because there is no mechanism to prevent to use the same key for different object, users may be confused or keys may be collided. The author of the first article suggest using some notation (e.g. application name + id). Programmers should be very careful not to use the same key. In addition to key conflict, as the third article shows, there will be many thing to adjust for actual deployment. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: Ghazale Hosseinabadi [gh.hosseinabadi@gmail.com] Sent: Monday, April 12, 2010 5:29 PM To: Gupta, Indranil Subject: 525 review 04/13 Paper 1: The Chubby lock service for loosely-coupled distributed systems This paper analyzes the performance of the Chubby lock service and introduces the differences between the theoretical design and the practical implementation. Chubby is designed to provide locking and reliable storage for loosely-coupled distributed systems. In a loosely-coupled distributed system, large numbers of small machines are connected by a high-speed network. Lock service provides a mechanism for synchronizing clients’ activities and for agreeing on basic information about their environment. Main design goals are reliability, availability, easy-to-understand semantics and throughput and storage capacity. The authors present the benefits of Chubby over a simple client library. Chubby is composed of Chubby clients and Chubby servers and they communicate via Chubby library. 5 replicas of servers are present in each Chubby cell, and they use a distributed consensus protocol to select a master among themselves. Each client is mapped to one and only one master at each time instance, but its corresponding master might change over time. A client’s requests are sent to its corresponding master. The main requirement for holding the lock is as follows: At most one client handle can hold the lock in exclusive (write) mode. On the other hand, any number of client handles can hold the lock in shared or reader mode. The events that might happen in Chubby are as follows: file contents modified, child node added, removed, or modified, Chubby master failed over, a handle (and its lock) has become invalid, lock acquired, conflicting lock request from another client. Chubby design provide methods for handling all these events. In Chubby, scaling issued are resolved by caching, protocol-conversion servers and load adaptation. Pros: It designs a complete system for providing distributed lock service. Many issues are considered and resolved in the system design. Cons: No comparison with the existing work is present. It is not clear what the advantage of Chubby over methods for achieving mutual exclusion is. Paper2: How to Dramatically Speed Up Your Web Application: An Introduction to memcached In this article, Memcache, which is a tool for speeding up web applications, is introduced. Memcached acts as a single cache for storing frequently visited web pages. Although the size of a cache in Memcached might be very large, it provides methods for fast response. Fast response is mainly achieved by having highly efficient, non-blocking network libraries. In memcached, the cache is considered as one big associative array in which each item in the array is indexed by some arbitrary string. A key is then used to uniquely identify data stored in the cache, and also to retrieve and remove data from the cache. Memcached does not include any security mechanism and caution may be taken by users to make memcached secure. Some example methods are preventing external access, encrypting the data that is stored in the cache, choosing obscure keys, … . Pros: This article introduces a very simple method for caching the information in order to speed up in web-applications. Using memcached is simple and users do not need to have any specific knowledge or skill, except for adding simple security mechanisms. Cons: Time complexity analysis of memcached is not presented. It is not clear what the trend of response time versus size of the cache is. No comparison with the scenario in which memcached is not used is presented. It is not clear how much improvement is achieved by memcached. It is better to design a secure memcached instead of requiring the users to add security mechanisms. From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Monday, April 12, 2010 3:36 PM To: Gupta, Indranil Subject: 525 review 04/13 Paper 1: Dynamo: Amazons Highly Available Key-value Store Dynamo is a data store used for storing state of a number of core services of Amazons e-commerce platform. Dynamo uses a synthesis of different techniques to achieve scalability and availability. Data is partitioned and replicated using consistent hashing which results in incremental scalability. Consistency is facilitated by object versioning using vector-clocks with reconciliation during reads which results in having version size decoupled from update rates. A quorum-like technique is utilized to provide consistency among replicas during updates and temporary failures. A decentralized replica synchronization protocol using Merkle trees is employed and runs in the background. Failure detection and membership protocol is based on a distributed gossip-based approach. Dynamo preserves symmetry and avoids having a centralized registry for storing membership and node liveness information. Dynamo is fully decentralized and needs very low manual administration, e.g. storage nodes can be added and removed from the system without requiring manual partitioning or redistribution. The statistics mentioned in the paper report the significant availability and scalability of the system. The approach is being used in an extensive scale which is a good indication of its efficiency. The way the Dynamo exploits application-specific requirements and relaxes some features in favor of other application-required features is interesting (e.g. relaxing strong consistency for more availability). Providing the client systems with significant tuning parameters that helps them have their own application-induced trade-offs is valuable. Not layering the design and accounting for all concepts and issues (availability, reliability, scalability, etc.) in the same layer, Dynamo has made its design vary complicated. The size of vector clocks grows greatly which makes the suitability of this choice (in the current exact way) questionable. The authors have not clearly explained how exactly Dynamo can achieve to be a zero-hop DHT (some sort of proof, probability calculation, etc.?). Paper 2: Memcached Memcached is a distributed memory object caching system intended for use in speeding up dynamic web applications by alleviating database load. To use memcached, before querying the database to retrieve data, the memcached is checked for the data. If it is found, the data can be taken from the memcached instead of the database. If it is not found, the database would be queried and the data would be pushed in the memcached to have subsequent calls fetching the information much faster and more efficient. When the data is updated, the copy in memcached must be deleted to have the data in the memcached accurate and fresh. Afterwards, when a user is trying to access the data, he will have to load it again from the database and add a copy to memcached. Memcached is similar to a big associative array in which each item is indexed by a unique identifier (e.g. a SHA1 hash of a unique string). Facebook is (likely) the largest user of memcached. However, they have made some changes in memcached to address its shortcomings to their application regarding efficiency and speed. Memcached uses a per-connection buffer to read and write data out over the network, resulting in use of a considerable amount of memory in the scale of Facebook. To reclaim this memory for user data, they implemented a per-thread shared connection buffer pool for TCP and UDP sockets. To reduce network traffic and implement application-level flow control for multi-gets, they moved to UDP for get operations, and used separate UDP sockets for transmitting replies in order to mitigate the effect of lock contention on the UDP socket lock when transmitting through a single socket from multiple threads. In addition, they do a combination of interrupt driven and polling driven network IO to prevent some cores getting saturated under load. Besides, they move stats collection per-thread and then aggregate results on-demand, instead of being relied on a global lock as in memcached. Being used extensively in Facebook indicates usability of the memcached. Different memcached libraries to choose from, including libraries for the popular programming languages, support an extensive usage of it. However, security is not a built-in component of the memcached and some other external methods should be utilized for addressing security concerns. Explicitly loading data from databases and storing in memcached, and then deleting them is not interesting. How exactly is the process of identifying the updates? Presenting some results and diagrams could be very helpful to have a good picture of the effect of using memcached itself and the changes made on the performance.