From: Shivaram V [shivaram.smtp@gmail.com] on behalf of Shivaram Venkataraman [venkata4@illinois.edu] Sent: Tuesday, April 20, 2010 12:13 PM To: Gupta, Indranil Subject: 525 review 04/20 Shivaram Venkataraman - 20 April 2010 On the Energy (In)efficiency of Hadoop Clusters Summary: This paper presents a study of energy consumption in clusters which run Hadoop and makes a case for modifying Hadoop such that many machines can be turned off during periods of inactivity. Files which are processed using Hadoop are stored in HDFS, a distributed file system. In order to tolerate failure of machines, HDFS replicates each file 'n' times in the cluster. Typically, HDFS creates three replicas, with the first one on the node where the request is handled, a second one in another machine in the same rack and a third one on a machine in a different rack. This scheme is fault tolerant towards machine and rack level failures. The authors propose a new replication scheme that will allow energy savings on inactive nodes. In this scheme, at least one replica of every data-block is replicated on a set of nodes known as the 'covering subset'. The important property of the covering subset is that it contains sufficient number of nodes to ensure availability of data even if other nodes were to be disabled. The size of the covering subset is an important factor as if it were too small, there could be limitation in the storage capacity or I/O bandwidth. Also the authors suggest that the covering subset can be chosen on a per-file basis by the user and covering subsets in a large cluster can be intelligently managed across multiple users and applications. Some Future research ideas (proposed in the paper): - Reliability and Durability - A background job might periodically check data integrity by waking up the sleeping nodes - Scheduling policies - The Hadoop scheduler and the power management system need to co-operate in order to disable the right set of nodes. Pros: - Tackles a very important and relevant problem in cloud computing. - The idea of covering subset is interesting and the authors present good initial evaluation using real-world Hadoop deployments. Cons: - It is not clear how covering subsets across users will be merged (a central authority ?) and how can one ensure that the overall energy savings is above a certain threshold. - It would be interesting to consider not only homogeneous Hadoop clusters but also take into account heterogeneous clusters having webservers, databases and Hadoop jobs etc. - The cost of disabling some machines has other associated problems: a. All the replicas for new files / updated files will only be stored on the machines that are enabled and this may lead to an uneven distribution of data in the cluster. b. It would be interesting to study how much availability is sacrificed due to disabling some machines. (If a machine in the covering subset fails and the disabled node cannot be accessed for some reason) From: pooja.agarwal.mit@gmail.com on behalf of pooja agarwal [pagarwl@illinois.edu] Sent: Tuesday, April 20, 2010 11:59 AM To: Indranil Gupta Subject: 525 review 04/20 DS REVIEW 04/20 By: Pooja Agarwal Paper – Cost and Energy Aware Load Distribution across Data Centers Conference – HotPower 2009 Main Idea: This paper presents an optimization metric which considers the cost of operation of multiple data centers in terms of power consumption. The load balancing is optimized based on the cost of electricity available at different datacenters, the type of source of energy (green vs brown), and the method of electricity billing. All the above metrics are combined to obtain an overall cost of processing a request at a given datacenter. The idea is to forward the requests to datacenter which has lesser overall cost in terms of energy costs and still provide necessary availability to accomplish SLA agreements. The authors also propose a heuristic algorithm which tries to minimize the overall cost by using greedy approach in choosing least costing datacenter for each request. Pros: 1) The optimization algorithm is decentralized and hence, different frontends need not collaborate to compute the optimization equation for each request. 2) The idea of incorporating green energy sources in the optimization equation is interesting in face of current drive towards increasing the percentage of renewable energy in daily electricity consumption. 3) The prediction methods used and the evaluation methodology chosen presents a fairly good model for evaluation. Cons: 1) The prediction based load balancing is representative of best effort load balancing; however, it does not work in the events of sudden burst of traffic. The latency might drastically increase during such events due to the added overhead of starting sleeping machines to process new requests. 2) The SLA requirements need to be relaxed as locality is not considered in this approach and hence, potentially the latency figure in SLA will be the worst case latency a request can incur(due to request mapping to farthest datacenter). The increased chances of poor latency might be unacceptable to certain real-time/soft real-time applications. 3) There are session dependent applications in which all the requests need to be handled by the same session. For this, the paper proposes to assume maximum number of requests that can potentially fall in a session however, number of requests per session is highly dependent on individual application and hence it is quite unclear how using maximum number of requests would work. With Regards, Pooja Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Tuesday, April 20, 2010 11:03 AM To: Gupta, Indranil Subject: 525 review 04/20 CS525 reviewed on Green Clouds Liangliang Cao (cao4@illinois.edu) April 20, 2010 Paper 1: Managing Energy and Server Resources in HostingCenters, J. Chase et al, SOSP 2001. This paper presents architecture of Internet hosting centers operation system, with an emphasis on energy resource. The key idea is to automatically adapt to offered load, dynamically resize the active server set, and respond to power supply disruptions or thermal events by degrading service in accordance with negotiated SLAs. In the framework of economic “bidding” strategy, the proposed method manages shared server resources, by adjusting resource prices to balance supply and demand, and allocating resources to their most efficient use. Pros: • Experimental results from a prototype confirm that the system reduce server energy usage by 29% . • The “bidding” strategy seems to be very useful in distributed system. Cons • There are parameters in the systems which may need efforts to tune. • The paper focuses on CPU assignment and little on I/O resources. In practice, network and I/O might be more important for efficiency and more attention should be paid. Paper 2: On the Energy Inefficiency of Hadoop Clusters, J. Leverich et al, HotPower 2009 This paper considers the energy efficiency problem in Hadoop. The energy efficiency of a cluster can be improved in two ways: (1) by using a few active nodes to the current needs of the workload with remaining nodes in low-power standby modes; (2) by engineering the compute and storage features of each node to match its workload and avoid energy waste on oversized components. The inefficiency of energy usage in Hadoop lies in three aspects: First, MapReduce frameworks associate each node with such a large amount of state renders state-of-the-art techniques that manage the number of active nodes impractical. Even idle nodes remain powered on to ensure data availability. Second, MapReduce frameworks lead to energy waste on idling components on nodes supporting both MapReduce compute/data-storage requirements as well as other workloads hosted on the same cluster (e.g., front-end web serving). Finally, given the unreliability of commodity hardware, MapReduce frameworks incorporate mechanisms to mitigate hardware and software failures and load imbalance. Such mechanisms may negatively impact energy efficiency. Based on these observation, this paper presents the early work on modified Hadoop to allow scale-down of operational clusters which also shows that running Hadoop clusters in fractional configurations can save between 9% and 50% of energy consumption. Pros: • This paper is one of the first paper improving MapReduce’s efficiency. • The arguments of improving Hadoop efficiency are convincing: (1) Hadoop has the global knowledge necessary to manage the transition of nodes to and from low-power modes. Hence, Hadoop should be, or cooperate with, the energy controller for a cluster. (2) it is possible to recast the data layout and task distribution of Hadoop to enable significant portions of a cluster to be powered down while still fully operational. • The finding of the tradeoff “between performance and energy consumption” is very insightful and should inspire more research in this direction. Cons • The proposed method is not general and I am not sure it works in every scenarios. • The tradeoff between performance and energy consumption is not well handed. • A minor shortcoming is that this paper doesn’t consider the energy spent on network communication. I wonder whether there is room to improve in that aspect. From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Tuesday, April 20, 2010 10:45 AM To: Gupta, Indranil Subject: 525 review 04/20 Managing Energy and Server Resources in Hosting Centers Internet hosting centers continue to offer a larger and larger set of services to a fast growing user community. While the users enjoy higher computation facilities, better latency responses and higher throughputs, there is something else which is growing at no less than an alarming rate: and that is the energy requirement of the underlying network devices and infrastructure. According to some estimates, network infrastructure consumes somewhere from 2-7% of the total electricity consumption in the USA alone. A major reason for this is that infrastructure is usually built with future growths as well as peak loads in mind. Therefore, as it turns out, most of the networks and hosting centers are highly over-provisioned when looking at average loads. The authors try to help ameliorate this problem by proposing the design and implementation of MUSE: An energy-aware resource management architecture for a hosting system. They look at the classical client-server paradigm and present a simple economic approach where a client has to ‘bid’ for receiving certain resources and the cost of the resources is adapted dynamically depending upon load and contention for resources in order to meet supply and demand. Their experimental results show that 29% of energy can be saved in certain cases by their design. The main components of their system include: A pool of shared servers which cooperate to serve the loads. A reconfigurable switching fabric is used to redirect different amounts of load to servers capable of handling it efficiently. A load monitoring and estimation module and a policy to reallocate servers in order to respond to changing load conditions. Finally they also allow for graceful degradation in service upon the instance of power or cooling failures. My main concern lies with their idea of putting servers down under periods of low loads. It is true that idle power draw remains considerable and it makes sense to power down the elements which are not fully utilized. However, there is a cost to doing that. First off, there might be issues with loss of state etc with powering down, secondly, the cost and latency of coming back must also be considered and both of these may not be insignificant. Finally, the Wake-on-LAN approaches have long been criticized because most hardware is such that the device that goes down loses its network presence. This would mean that in case of servers, other servers might think that they have lost connection to this server which has actually only gone to sleep mode or low-power mode. There would be a need for an omniscient central authority that is aware of when and which servers are powered down and how to reach them in order to wake them. Usually, it requires additional hardware support for a powered down server to accept a wake-up packet, process it and then power up and of course this process takes time as well. Their executive does this function right now as the central authority but it is clearly a single point of failure. A few interesting directions would be to try and see if different servers may be assigned in groups according to different services and then adapt different groups separately. Also the adaptation may be performed over different resources e.g. on the basis of resources like storage, CPU etc. The cooling system and application may be integrated along with MUSE to compute weighted feedback from all the entities. From: Jayanta Mukherjee [mukherj4@illinois.edu] Sent: Tuesday, April 20, 2010 10:27 AM To: Gupta, Indranil Cc: indy@cs.uiuc.edu Subject: 525 review 04/20 Green Clouds Jayanta Mukherjee NetID: mukherj4 Cost- and Energy-Aware Load Distribution Across Data Centers, by Kien Le et. al., In this work the authors tried to demonstrate the opportunity of exploiting geographically distributed data centers to optimize the energy consumption. The main objective to keep the data centers geographically apart is to provide uniform access time, higher availability, disaster tolerance along with the provision of large computational infrastructure. But, the unique contribution to this work is to exploit the fact that the data centers are geographically apart, so they fall in different time zones and may have different tariffs and source of energy. The optimization they proposed is interesting from the point of view of economical benefit as well as from the environmental point of view. Pros: 1.The analysis has been done considering the use of Green Energy, which is good for the environment. 2.They do not propose shifting the datacenters or doing any major change in the current infrastructure. So, no additional cost ss involved in following their proposed heuristics. 3.Only change is needed on the applications which decide which datacenters to look for. Implementing the algorithm by updating the application software is relatively simpler and cost-effective. 4.They compared their scheme with cost-unaware version to show the improvement we can get if we consider the cost involved in serving the access-time. 5.The On/Off and Dynamic schemes enable significant cost-reductions. The paper concentrate on reducing the cost of energy usage and motivate to use renewable form. The On/Off mode also mitigate the energy wastage due to idling of the processors. Cons: 1.The above mentioned approach of redirecting the access (requests) to cheaper (cost-effective) datacenters will definitely cause load-imbalance. The cost-effective data centers will face lots of accesses. So, the machines will be heavily loaded and there will be network congestions due to the large number of requests. 2.The authors consider the usefulness of green energy, but, green energy sources are limited and so, we can not compare the availability (as of now) with the conventional energy sources. So, it is theoretically possible to exploit green energy, but, practically not that many sources of green energy is available now or even in near future. 3.Accessing data from a remote data center, where, the energy is cost-effective will increase delay in getting the data and requests like playing a media file with longer delay will minimize the entertainment value of the service. So, to minimize the delay, the data has to be replicated (mirrored) at a nearby server. 4.They consider six-four hour epochs and running the optimization module every hour which simply does not make sense. They should either run the optimization every 4 hours or develop 24 1 hour epochs. Comments: This study apparently motivates the use of renewable energy and to use low-cost data centers, but eventually overlook the load-balancing, long latency and network traffic congestion issues which can severely impact the interest of user base of the service. No service provider would like to loose its user to minimize the energy cost by a few percentage. On the Energy (In)efficiency of Hadoop Clusters, by Jacob Leverich et. al., In this paper the authors suggest to modify the Hadoop (an open-source Map-Reduce framework by Yahoo) to improve the energy consumption. As mentioned in this paper, they presented an early work on modifying Hadoop to allow scale-down of operational clusters. The energy efficiency of a cluster can be improved in two ways: 1.By matching the number of active nodes tothe current needs of the workload, placing the remaining nodes in low-power standby modes; 2.By engineering the compute and storage features of each node to matchi ts workload and avoid energy waste on oversized components. They argued that, Hadoop has the global knowledge necessary to manage the transition of nodes to and from low-power modes. Hence, Hadoop should be, or cooperate with, the energy controller for a cluster. It is possible to recast the data layout and task distribution of Hadoop to enable significant portions of a cluster to be powered down while still fully operational. Pros: 1.The authors demonstrated that running Hadoop clusters in fractional configurations can save between 9% and 50% of energy consumption. 2.They provides a good motivation by providing the plots in the first page and mentioning that 38% of the time, a node was idle for 40 seconds or longer. 3.The authors figured out and mentioned in the paper the characteristics of MapReduce framework which impedes the energy-efficient implementation of the same, e.g. use of commodity nodes, distributed data-store comprised of disks in each node and unreliability of the commodity clusters. 4.They proposed a new invariant for use during block replication to remove the short-coming of replication strategy of Hadoop. 5.They outlined further research into the energy-efficiency of these frameworks. It is useful for other researchers working on similar problems. Cons: 1.There is a trade-off between performance energy consumption.But, performance is often considered to be most-important as users may not wait for longer to download a file (song or a video) or would like to keep waiting when they are playing online games or watch some movie.  2.The arguments they made are questionable, as transition to low-power mode is not a property in-built in Hadoop. If it is there, then, there would not even be this issue of energy efficiency that they have discussed. 3.If we powered down some node and that node was set as a mirror or have the data for which the request has been made, then some other node which is alive (On), has to contain the information that, which node has the data. So, this strategy will increase the metadata size and results in longer time to find the data. 4.Also, repeatedly switching On and Off will add some cost and may potentially reduce the life of the power-controllers of the system. Comments: Switching off a node improve the energy efficiency is a well-known fact. It is neither a new concept nor specific to Hadoop. All the improvements suggested at the end of the paper is associated with lots of issues like larger metadata, more latency, geographically distant mirrors and load-balancing problems, which are not being considered. It is more of a philosophical paper missing the details of the implementation-issues. From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Tuesday, April 20, 2010 10:13 AM To: Gupta, Indranil Subject: 525 Review 04/20 Review for “Cost- and Energy-Aware Load Distribution across Data Centers This paper explores the problem of decreasing the energy costs of organizations which maintain multiple data-centers which are geographically diverse and have access to energy whose prices vary dynamically. It also addresses how such organizations can perform balancing among their multiple data-centers in order to prioritize utilization of energy from green sources, hence leading to more efficient energy usage. It formulates this problem as an non-linear optimization problem in which the company tries to minimize the energy costs ($) incurred by it as a result of a particular partition of incoming requests among the available data centers while ensuring that the Service Level Agreements are satisfied. The SLA considered in this paper is quite simple, on an average a given percentage of the total number of requests experience a latency smaller than a given threshold. The front-ends of the data-centers periodically solve this optimization problem and divide the incoming requests among the data-centers according to its solutions. The paper proposes mechanisms to estimate the future values of various factors such as energy costs and load intensities in order to achieve an accurate estimate of the optimal solution. It further proposes simpler greedy heuristics in order to achieve a similar goal but with much lesser computation/complexity. The experiments, which were performed using real traces from a large Website, show that the optimal scheme can achieve as much as 25% decrease in the cost incurred by the company under dynamic pricing conditions. Comments: - While this paper takes the first step dealing with cross data center optimization, (w.r.t total cost incurred by an organization), it doesn’t consider the efficiency with which the data centers can server a particular request. The paper doesn’t differentiate requests based on their types: for example, a fetch of a large video file might have to be routed to a data center that has larger bandwidth connect with the client. In short, different requests can have different requirements and these might have to be met the service on an average and optimizing cost alone is not be the correct method to do this. - It is clearly seen that when the base energy of the machines in a data center is not zero (which is the case in reality), the savings in the cost provided by the optimal allocation is not quite significant. Further, the paper presents results when these values are 75W and 150W but doesn’t provide reasoning why such values have been chosen. - The paper does not evaluate the effect of it mechanisms to prediction the load/energy price. This could be a quite important factor in the advantages obtained and it is not completely clear why the authors adopted the methods they did. - The authors propose to perform the optimization periodically, after every 1hr the optimal solution is recalculated. It would be interesting to see how the effects would be when such recalculation is done at a finer granularity. - The paper uses traces from Ask.com but does not justify why these are representative of a typically website’s load. For example, google, youtube might have a very different traffic characteristics and it not clear how the proposed mechanism perform under such loads.525 From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Tuesday, April 20, 2010 10:02 AM To: Gupta, Indranil Subject: 525 review 04/20 Cost and Energy-Aware Load Distribution Across Data Centers -------------------------------------------------------------------------- Summary ------------ In this paper, the authors present a framework for optimization based request distribution, on the basis of which they propose two request distribution policies from the front ends to the data centers. Policy EPrice aims at exploiting time zones and variable electricity prices. The fraction of requests that should be sent to each mirror data center during an epoch should minimize the overall energy cost. On the other hand, policy GreenDC where the cost optimization function involves the use of energy obtained both from electricity and that obtained from renewable energy sources. Many of the parameters in their formulations involve predictions and they solve the problem using simulated annealing. Recomputations occur when some of the predictions proved to be inaccurate, when a data center becomes unavailable or when some green energy expires. The authors also propose a heuristic based request distribution in which during each hour, the policy fist exploits data centers with the best power and then exploits those with cheapest electricity. In all the above policies, care is taken that the policies meet the requirements of service level agreements. In addition, the authors evaluate their system using a day-long request trace from Ask.com and compare their policies against a cost-unaware policy. According to their evaluations, EPrice outbeats other policies in terms of cost. Pros ------ 1. They take into consideration the variable electricity prices in varying countries and time zones. This is a useful consideration. 2. They strive to honor service level agreements that make their policies more practical. Cons ------- 1. For their evaluation, they use a day-long trace. However, it is not clear how this trace could be taken as a representative of the typical traffic seen by data centers. The authors do not mention if the trace belonged to a day of heavy/low/expected traffic. 2. The evaluations merely use PlanetLab nodes with a simple service installed, this therefore does not involve too much of processing at the nodes, unlike data centers. Could this then show accurate results? 3. They mention that 2-3 mirror data centers are enough for replication, but do not backup their statement with any research or facts. Thanks, Kurchi Subhra Hazra Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.illinois.edu/homes/hazra1/ From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Tuesday, April 20, 2010 1:53 AM To: Gupta, Indranil Subject: 525 review 04/20 ===================================================================== Paper 1: Cost- and Energy-Aware Load Distribution Across Data Centers ===================================================================== Large Internet service-oriented organizations such as Google, iTunes, Microsoft, etc. have multiple data centers. These multiple data centers are used for business distribution and to achieve high availability, disaster tolerance, and uniform access times to widely distributed client sites, etc. Despite these tremendous advantages, these data centers come up with some major problems such as huge energy consumption and financial and environmental cost. In the paper titled "Cost- and Energy-Aware Load Distribution Across Data Centers", the authors tried to exploit the geographical distribution of data centers for optimizing energy consumption. They tried to exploit the different & variable electricity prices (hourly pricing), different time zones of data centers (peak/off-peak demand price), and renewable “green” electricity availability in different data centers. The paper has the following major contributions: it proposed a framework for optimization-based request distribution policy; it proposed a greedy heuristic distribution policy, and finally it evaluated the proposed approaches using a day-long trace from a commercial service (ask.com). In the optimization based distribution, the authors discussed two approaches: Policy EPrice that leverages time zones & variable electricity prices and Policy GreenDC that leverages data centers powered by green energy. To instantiate parameters, the authors avoided the typical approach (each front end will communicate and coordinate each other) to reduce the overhead. Rather, they adapted a very simple approach where each front end will run optimization function independently. The authors claimed that, such approach can satisfy the global constraints. To solve optimization problem, the authors use Amaren data for Electricity price prediction, ARMA (Auto Regression Moving Average) for load intensity prediction, and current CDFi tables for CDFi prediction. Since the cost function and constraints contain some no-linear parameters (e.g. BCosti, CDFi, etc.), the authors used Simulated Annealing as opposed to simple Linear Programming to solve the optimization problem. Then the authors described their cost-aware heuristic approach where each front-end orders data centers that have CDFi>= P according to Costi/CDFi from lowest to highest ratio. The remaining data centers are ordered by same ratio. They then concatenate these two lists to create final list (MainOrder). Client Requests are forwarded to first data center in MainOrder until its capacity is met. Then new request is forwarded to next data center on the list and so on. After front-end has served R request within L time, it can disregard MainOrder and start forwarding request to cheapest data center until capacity is met. If prediction is inaccurate, front-end adjusts R for the next epoch. The authors implemented a simulator for large Internet service. In their experiment, they simulated only a single front-end (East US) which distributes requests to three data centers (West US, East US, and Europe). The authors considered three price schemes such as constant rate, on/off peak rate, and dynamic hourly rate. To mimic different brown electricity prices for each data center, the front-end shifts default prices 3 hours earlier or 6 hours later. For comparison purpose, the authors also proposed a cost-unaware (CU-heuristic) distribution policy. CU-Heuristic follows the similar approach of CA-heuristic. Here, each front-end orders the data centers according to performance (CDFi) from highest to lowest. The client requests are forwarded to the first data center until its capacity is met. New requests are forwarded to the next data center and so on. Finally, the authors showed the result of their proposed approaches. They mainly showed the effect of cost-awareness and pricing scheme, the effect of green DC, and the effect of Base Energy. Pros: 1. This is the first paper which proposed cost- and energy-aware load distribution across data centers. 2. Proposed policies take advantage of time zones, variable electricity prices, and green energy. 3. Proposed approaches used the realistic data for evaluation. Cons/Discussion points: 1. The authors considered only one Front-end in experiment. Is it correct to claim that when more front-ends are used, they will satisfy the global constraints? 2. How to ensure end-to-end QoS guarantee? Can we combine SLA guarantee with QoS requirement provided by clients? 3. How to handle services with session state? In case of soft state, a user’s session only lasts with the service and hence all requests of a session must be sent to same DC. 4. Can we apply the similar concept for multi-cloud structure to optimize power and monetary cost for online service provider? 5. In multi-cloud computing, is it good to assume that data will be available in clouds beforehand? What are the pros and cons of such assumption? From: Nathan Dautenhahn [dautenh1@illinois.edu] Sent: Tuesday, April 20, 2010 12:02 AM To: indy@cs.uiuc.edu Subject: 525 review 04/20 ********************************************************* * CS525 Reviews : 4.20.10 : Nathan Dautenhahn * ********************************************************* ===================== Paper 1 ===================== 1. Cost- and Energy-Aware Load Distribution Across Data Centers Authors: Kien Le, Ricardo Bianchini, Margaret Martonosi, and Thu D. Nguyen 1.1 Summary and Overview This paper discusses the development of a new breed of large-scale distributed systems request distribution. The primary focus of this system is to deal with systems that include multiple data centers, and subsequently selecting the most power/cost effective distribution of the requests. The system proposes a framework that takes into consideration several factors when distributed an Internet request including: variable electric costs at data centers, data centers in different time zones, and data centers that may be located close to renewable energy sources. 1.2 Contributions This paper is a "first of its kind" in the field. It really identifies an area that in our current world of huge distributed systems scaling multiple physical locations is prime for savings. In addition to this the authors take great care in applying rigorous mathematical formalisms to solve the problems. They use both an optimization approach and a heuristic based approach, which appear to do well. 1.3 Limitations - The authors approach is good, but lacks a real depth. The paper seems as though they found a prime problem with a lot of low hanging fruit. It seems like their selection of characteristics and subsequent processing of those were limited in nature, and could be improved upon very easily with better controls and parameters to optimize on. I think they chose good high level ideas, but haven't processed them the best. - The introduction was really slow. I found that I had to infer a lot before they actually said what they were doing in paragraph five, and even this paragraph was somewhat weak in its description. They kept talking about things as though they were symptoms and leading us on, and their paper would be much stronger with a better intro. - This authors include functions that they never really refer to, which was confusing. - I don't like their evaluation. They only simulate a single front end, which doesn't account for their lack of a distributed solution. They also do not include response times for computers and simulate this via planet lab systems, which may or may not be close to the real thing. 1.4 Comments - I didn't see a lot of discussion on the mathematical proof that the author's algorithms will in fact successfully meet the SLA requirements. - In their optimal approach does each front end have access to the same data from the different sites? If not then they really cut a corner with not including a distributed optimized solution. ===================== Paper 2 ===================== 2. Cost- and Energy-Aware Load Distribution Across Data Centers Authors: Kien Le, Ricardo Bianchini, Margaret Martonosi, and Thu D. Nguyen 2.1 Summary and Overview This paper investigate the use of optimizing energy resources in allocating jobs in a server environment for Internet activities such as web hosting. The goal is to provision servers and resources such that the optimal energy efficiency is achieved with a flexible management architecture. The authors have produced the frame called Muse as their implementation of this framework. The introduce the concept of adaptive resource provisioning. 2.2 Contributions This paper has several contributions and are listed as follows: - The development of a Muse, a flexible resource management architecture. - The use of reconfigurable switches in provisioning. - The development of adaptive resource management. - The use of an economic cost function framework by which to produce the optimal results. 2.3 Limitations There are few limitations to this work. The authors have done a thorough job of identifying and motivating the problem, but there is one area that I think could use improvement. - The authors have used the operating systems abstraction, which really doesn't fit the problem that well. I think of course that an OS abstraction is arguably semantically the same, but this introduces a few limitations on the system. Namely, the authors introduce a single point of failure and large potential bottleneck in the one center operating system. This could be better distributed in multiple tiers for better distribution of jobs. - Some of the authors motivating data is a bit skewed (e.g., Internet traffic being represented by one website, the world cup, which could be a very bad estimator for reality. 2.4 Comments - Overall this paper does a good job of approaching this problem. - I wonder if this is a first glimpse of something related to cloud computing in terms of providing an "elastic" service for end users? 3.0 Common Themes Each of these papers discus the use of energy/cost metrics to manage the scheduling of jobs in a server environment. I feel like Chase et al. did a much better job of identifying the right parameters to base their algorithms off of, and thus have created the better paper. They include a much more in depth analysis of the issue and provided a more accurate model and representation of the actual data. --- Nathan Dautenhahn --- From: arod99@gmail.com on behalf of Wucherl Yoo [wyoo5@illinois.edu] Sent: Monday, April 19, 2010 11:29 PM To: Gupta, Indranil Subject: 525 Review 4/20 Green Clouds, Wucherl Yoo (wyoo5) On the Energy Inefficiency of Hadoop Clusters, J. Leverich et al, HotPower 2009 Summary: The authors observe that large part of the Hadoop cluster is inactive status thus wastes energy. By turning off the inactive machines, they can reduce the energy consumption. However, this increases running time of a job. Thus there exists tradeoff between the energy consumption and the performance. They propose new replication invariant, covering subset: at least one replica of every data block must be stored in this subset. To maintain this replication invariant, some node that has only replica of a covering set will not be disabled. Pros: 1. Power consumption can be reduced with the little weaken data availability from covering set and with increased time. – weaken availability and runtime overhead was not severe. 2. Interesting discussion of section 4 for the future work – looks promising but no verification for now. Cons: 1. If the usage level is dynamic, rebooting the machine cases more overhead. – lack of discussion about the dynamicity of utilization level. 2. Covering set also weakens the level of redundancy from replica. Thus it makes longer delay from the failure recovery if all remained replica are not available (this is more likely due to reduced number of replica from active nodes). 3. There exist multiple mechanisms to reduce power however the authors only used most powerful one (turning off the nodes.) More dynamic control with more power-down mechanisms with integration with job scheduler will be more promising. -Wucherl From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Monday, April 19, 2010 10:21 PM To: Gupta, Indranil Subject: 525 review 04/20 525 review 04/20 Hyun Duk Kim (hkim277) * On the Energy (In)efficiency of Hadoop Clusters, J. Leverich et al, HotPower 2009 This paper explored Hadoop energy efficiency problem and proposed a method for more energy efficient clustering. Although Hadoop is widely used as a distributed system tool, its effect on data center energy efficiency has not been studied. This paper proposes a method to save energy in Hadoop clustering by make inactive nodes in power save mode. Authors suggest a covering subset based method for replication allocation. According to experiment results, although there were some performance degrade, the proposed method save power consumption from 9% to 50%. This paper raises an interesting and important future computing issues. Hadoop kind of clustering will encourage build a big data center. The consideration of energy consumption on the data center can help to save expenses and eventually save the earth. Decreasing computation will also decrease heat machines generate, it will save the energy of cooling system. The effort to save energy consumption on Hadoop may enable it on mobile system. These days, mobile system is everywhere and well connected. Some people suggested clustering on those mobile devices. Because the power is a big issue in mobile system, the studies of energy saving may help to accelerate the Hadoop on mobile. Energy saving efforts may decrease its reliability as well as general performance. Switching cost between active and inactive mode is mentioned in the paper. Also, if we made all replicas in power save mode, suspicious tasks execution (for the task on the slow node, Hadoop let another node do the same job, and use the faster result from them) may be limited. Moreover, because current replica locating method looses the constraints of the current invariant, it will decrease the reliability. For example, based on covering subset, all replicas may be on the same rack, and then it cannot be recoverable when the rack crashes. If authors performed some experiments with failures, it would be great to show the trade-off between power saving and reliability. * Cost- and Energy-Aware Load Distribution Across Data Centers, Kien Le et al, HotPower 2009 This paper presents a framework for data center load distribution for cost and energy efficiency. The geographical distribution of the data center can be factors for cost and efficiency optimization; electricity prices are variable over different places, different time zones has different peak/off-peak times, and nearness to green energy source can decrease the use of brown energy. With these considerations, authors framed an optimization problem to decide how to distribute computational load over data centers in different locations for cost and energy consumption efficiency. According to experiment results, proposed methods showed better performance in both cost and energy efficiency. This paper introduces geographical factor to a data center efficiency issue. Although there were some studies about data center cost and energy consumption, most of previous works considered only one data center. Authors found interesting factors we can adopt, the geographical location. Due to the different life and natural resources, different location has different characteristics. Authors made a clever suggestion with using the differences for data center load distribution. In reality, data centers in the world do not tightly connected. The main assumption of this suggestion is that there are many data centers over the different locations which we can control and distribute tasks. However, usually data centers are not very well connected. Many data centers are operated by commercial companies. If centers are developed by companies, they should keep their trade secrets, and do not share their power. If we want to build some data centers in other countries, there may be additional international legal issues. Building centers or using power sources in different time zone is not always easy, and there even can be legal issues. Therefore, in reality, this load distribution over data centers in different location will be pretty limited. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: Ghazale Hosseinabadi [gh.hosseinabadi@gmail.com] Sent: Monday, April 19, 2010 5:00 PM To: Gupta, Indranil Subject: 525 review 04/20 Paper 1: Managing Energy and Server Resources in Hosting Centers In this paper, an architecture for resource management (specifically energy management) in a hosting center operating system is designed. The designed architecture is called Muse. Muse determines server allocation and routing of requests to the servers based on customers' bids on resources. Muse is composed of the following four blocks: 1) Generic server appliances. This is the collection of shared servers. 2) Reconfigurable network switching fabric. This component is used for dynamically switching of requests to servers. 3) Load monitoring and estimation modules. This element is responsible for continuously monitoring the load in order to detect load shifts and to estimate aggregate service performance. 4) The executive. This component dynamically reallocates server resources and reconfigures the network by considering load in the system, resource availability, and service value. Resource allocation is done by economic resource management mechanisms. Parameters used to determine allocated resources to requests are : 1) The average cost to provide one unit of resource per unit of time 2) Utility function of each customer which is reflecting its “bid” for the service volume and service quality resulting from its resource allotment. Each customer’s bid for each unit of time is a function of its delivered throughput. Resource allocation in Muse is done using an incremental greedy algorithm called Maximize Service Revenue and Profit (MSRP). The objective is to maximize profit (which is total utility - total cost). A method for estimating resource demands based on continuous performance observations is presented. This estimation is necessary to predict the effects of planned resource adjustments. Pros: 1) This paper designs a mechanism for allocating requests to servers in hosting centers. The designed resource allocation mechanism is energy-conscious, i.e. its goal is to reduce server energy consumption. 2) Muse is an adaptive policy. It is capable of responding to dynamically varying resource availability and cost of the resource. 3) It uses classical results from auction theory. The parameters that determine utility and cost are such that the objective function (minimizing power consumption) in a real scenario is achieved. Cons: 1) Muse requires collecting bids from users. In a highly loaded cluster with numerous number of users, collecting bids is timely. It will add high delay in allocating resources, in some applications this delay might not be tolerable. There are methods studied in game theory that are capable of achieving the same objective function without requiring submitting bids by users. Game theory introduce methods in which users' willingness is indirectly embedded in the utility function. It seems more efficient to uses game-theoretic approaches rather that auction-based methods for resource allocation in clusters, to reduce overhead. 2) In a high dynamic environment in which users might join and leave the system quit often, using auction mechanism is not efficient, because the resource allocation method should be run each time the state of the system changes. In these cases, hybrid (static allocation + dynamic allocation) methods perform better. Paper 2: Cost- and Energy-Aware Load Distribution Across Data Centers In this paper, a framework for optimization-based request distribution in multi-data-center internet services is designed, in order to enables services to manage their energy consumption and costs, while respecting their service-level agreements (SLAs). An optimization-based request distribution determines the fraction of the clients’ requests that should be directed to each data center. The objective of the optimization framework is to minimize energy costs while guaranteeing good performance and availability. In formulation of the optimization problem, the effect of time zones, variable electricity prices and existence of data centers powered by green energy are considered. The authors also proposed a greedy heuristic policy which need less computations than the optimization-based approach. Pros: 1) In this paper, frameworks for request distribution in multi-data centers are designed. Previous related works only considered single data center scenario. 2) The problem is first formulated as an optimization problem and then a simple heuristic with reasonable performance is presented. Cons: 1) The overall computational complexity of the optimization framework is not presented. 2) The effect of number of data centers on the proposed methods is not described. From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Monday, April 19, 2010 4:28 PM To: Gupta, Indranil Subject: 525 review 04/20 Paper 1: On the Energy (In)efficiency of Hadoop Clusters This paper discusses energy related concepts of Hadoop clusters and proposes some preliminary changes in order to decrease energy consumption of Hadoop. The work is motivated by observing frequent periods of nodes inactivity with non-negligible durations. The method they adopt to improve energy efficiency is to match the number of active nodes to the current needs of the workload and place the remaining nodes in low-power standby modes. They argue that Hadoop has the global knowledge necessary to manage the transition of nodes to and from low-power modes and show that it is possible to power down a significant number of nodes in a Hadoop cluster, while achieving acceptable performance. They introduce the notion of covering subset which contains a sufficient number of nodes to ensure the immediate availability of data, even when all nodes not in the mentioned set were to be disabled. The work is valuable as it is addressing one of the main concerns of large data centers and distributed systems that must be covered for Hadoop clusters as well. It also points out different issues to improve energy efficiency of Hadoop beyond the current work. The authors presented some experiments; however, an extensive set of results is missing. A wide variety of effective concepts and features can also be taken into account to improve energy efficiency, e.g. dynamic power management, on-demand enabling of nodes, data locality properties, etc. A theoretical analysis of Hadoops energy optimization would be valuable. As scalability plays a very important role in Hadoop clusters and similar systems, evaluating this aspect of the work is very important (the current version of the work sounds to have such problems). The introduced concept of covering subset is not completely specified. Decisions like which nodes to (or not to) include in the set, how to manage the set, and other details are not explained. Paper 2: Cost- and Energy-Aware Load Distribution Across Data Centers This paper proposes to exploit the geographically distribution of data centers and distribute the computational load across different time zones in order to decrease associated cost and energy. They investigate three angles of the issue: to exploit data centers that pay different and variable electricity prices, to exploit data centers that are located in different time zones, and to exploit data centers located near sites that produce green electricity to reduce brown energy consumption. They propose two request distribution policies: EPrice, pertaining to leveraging time zones and variable electricity prices, and GreenDC, pertaining to leveraging data centers powered by green energy. They also propose a heuristic-based request distribution policy which compared to the aforementioned policies, is simpler in terms of computational complexity. The idea is interesting and necessary for cloud computing, the sites of which span different time zones and geographical locations. It would be valuable to take into account the networking cost as well. The approach is implicitly based on the assumption that the workload can be distributed across different sites. What if this property does not hold and some sort of dependency exists? It would be more appropriate not to omit other effective factors (it is solely focusing on cost and energy) and do a comprehensive sensitivity analysis of the approach as well. The length of accounting period is very crucial to the effectiveness of the approach and should be resolved and evaluated more specifically. The performance-related experimental results such as latency, computational complexity, etc. are missing. It is not clear how the overall efficiency of the approach is. From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Saturday, April 17, 2010 12:46 AM To: Gupta, Indranil Subject: 525 review 04/20 On the Energy Inefficiency of Hadoop Clusters, J. Leverich et al, HotPower 2009 This paper studies the energy efficiency of Hadoop clusters executing MapReduce jobs. They demonstrate that the traditional file replication heuristics that Hadoop uses (designed to maximise fault tolerance) have poor energy efficiency, and suggest that some control over the data placement be passed to the users (via an additional user-tunable parameter they name the "covering subset"). They then demonstrate via a simple benchmark that when these covering subsets are suitably defined, energy efficiency can be improved by turning off nodes during periods of disuse. Comments: + This seems to be the first paper (and the only one thus far?) studying energy efficiency in Hadoop. + In their final section they do a very thorough job of identifying other ways in which the energy efficiency of Hadoop might be improved. - The primary technical contribution of this paper seems to be, in its entirety, the observation that turning nodes off can reduce power consumption. There is no discussion about exactly /how/ one might go about doing this -- how would the covering sets be defined and modified over time for a given set of applications? The examples they cite in their evaluation section are very limited.