From: Jayanta Mukherjee [mukherj4@illinois.edu] Sent: Tuesday, April 27, 2010 12:30 PM To: Gupta, Indranil Cc: indy@cs.uiuc.edu Subject: 525 review 04/27 Old Wine: Stale or Vintage? Jayanta Mukherjee NetID: mukherj4 On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing, by Ian Foster et al: In this paper, the authors compare Grid Computing and Peer-to-peer(p2p) systems. They tried to define Grid as sharing environments implemented via the deployment of a persistent, standards-based service infrastructure that supports the creation of, and resource sharing within, distributed communities. They defined P2P as a class of applications that takes advantage of resources—storage, cycles, content, human presence—available at the edges of the Internet. The authors argued the following: 1.Grids and P2P systems, both are concerned with the same general problem, namely, the organization of resource sharing within virtual communities; 2.Both of them take the same general approach to solving this problem, namely the creation of overlay structures that coexist with, but need not correspond in structure to, underlying organizational structures; 3.Each has made genuine technical advances, but each also has—in current instantiations—crucial limitations. Pros: 1.This paper does a well-balanced review of Grid and peer-to-peer systems. 2.They distinguish Grid from a p2p systems by mentioning the size of user-base each system has and the type of service they are offering. Also, quality of the services available from and on these systems are also being characterized as a distinguishing factor between the two. 3.They distinguish the two systems based on the kind of users they have. It is quite interesting to look from such perspective, when a P2P can be deployed fully or partially on top-of a grid. 4.They address the point that Grids and P2P are generally use different kinds of resources. Grid are more integrated, administered and reliable, whereas, P2P relies on other users. So, P2P systems services are never reliable. So, to assure reliability they incur redundancy which degrades the performances due to latency and more bandwidth requirements. Cons: 1.The paper does not tell us anything new. Most of the people working on this field are aware of most of the features they are talking about. 2.The difference between two systems based on the users (like grass-root or sophisticated scientific community researcher) does not seem to me quite appealing. Comments: The paper does not offer anything fundamentally new. It can be a blog-item rather than a “Research Paper”. Some of the points they mentioned are opinions, not, much scientific reasoning behind them. From: pooja.agarwal.mit@gmail.com on behalf of pooja agarwal [pagarwl@illinois.edu] Sent: Tuesday, April 27, 2010 12:16 PM To: Indranil Gupta Subject: 525 review 04/27 DS REVIEW 04/27 By: Pooja Agarwal Paper – On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing Main Idea: This paper brings forth the similarities as well as differences between P2P and grid computing distributed systems. Both P2P and grid computing try to solve the same problem of efficient resource allocation and management between virtual organizations. However, both take different approaches to solve this problem based on target communities, types of resources and applications, and scale of use. Grid computing focuses on complex applications like scientific computing and analysis, requiring certain quality of service and is used within a modest number of nodes belonging to an organization. P2P on the other hand, provides access to lesser complex applications like file sharing, content distribution and do not take extensive care of quality of service, security, and authority. In terms of resources, P2P systems are generally composed of personal laptops and computers however, grid computing comprise of dedicated servers and more powerful work computers. The amount of data transferred in grid computing is way higher (amount 4.5 TB per day) than in P2P (1-2 TB) and grid computing is mostly based on centralized approach while current P2P systems are based more decentralized approach. Both P2P and grid computing provide complementary solutions. P2P achieves fault tolerance and scalability while suffers in infrastructure while grid computing provides infrastructure but not fault tolerance and scalability. Hence, there is a potential for cross techniques which employ the strengths from both these techniques. Pros: 1) The paper brings forth some interesting differences between P2P and grid computing and invokes a thought process of how these two approaches can be used to provide a cross technique which takes best from both worlds. 2) The paper describes the comparison in an interesting language and hence, it was both fun and an interesting read. Cons: 1) The paper hints about making use of both these techniques to achieve fault tolerance, scalability, persistent and multipurpose infrastructure however, it would have been more useful if the paper discussed at least some issues/complexity encountered when approaching such a solution. 2) With the recent increase in scalability and fault tolerance of grid computing (the newer version of grid computing being cloud computing), the need for a cross technique between P2P and grid computing may not be urgent anymore. With Regards, Pooja From: Shivaram V [shivaram.smtp@gmail.com] on behalf of Shivaram Venkataraman [venkata4@illinois.edu] Sent: Tuesday, April 27, 2010 12:06 PM To: Gupta, Indranil Subject: 525 review 04/27 Shivaram Venkataraman - 27 April 2010 On Death, Taxes and the Convergence of Peer-to-Peer and Grid Computing Summary: This paper compares and contrasts two approaches to distributed computing: Grid Computing and P2P systems. Both the approaches attempt to solve the problem of co-ordinated use of distributed resources but their design elements differ based on the requirements. The authors compare Grids and P2P on the following dimensions: a. Target Communities: Grid technologies were initially developed for professional organizations to pool resources and perform large scale computations. Hence there are smaller number of participants in the grid and they are trusted and accountable to a certain degree. P2P systems became popular through music sharing networks and the individuals who are part of the network don't have much of an incentive to co-operate. b. Resources: Grid systems generally have better connected and more powerful resources like clusters used for storage / databases etc. P2P systems deal with intermittent participation and less powerful machines. c. Applications: Grid applications often provide access to large scale computing resources for scientific and professional community which involves data intensive applications. P2P applications are specialized to resource sharing (compute cycles or files) and tend to be less data intensive due to poorer network connectivity. d. Scale and Failure: Since Grid communities consisted initially of profession organizations, there were only a limited number (order of hundreds) of participants. So traditional grid implementations were not designed with scalability as the highest priority and had a few central nodes for service like resource management etc. P2P communities on the other hand often scale up to hundreds of thousands of nodes and have evolved from centralized structures like Napster to completely decentralized designs using Distributed Hash Tables. e. Services and Infrastructure: There has been a lot of work in the Grid community to come up with protocols and standards for communicating across institutions. Also some highly available infrastructure services have been deployed for common activities like monitoring, scheduling etc. P2P systems tend to define their own protocol (eg. Gnutella's search and ping) and most of the applications are written from scratch. Pros: - Presents a thorough comparison of Grid and P2P across many dimensions - Concludes with an interesting point of view that the two technologies are converging and can share a lot of existing research Cons: - It would be interesting to perform a similar comparison of P2P systems with Cloud computing. In fact Cloud computing is pretty close to the definition the authors present as their vision "that of a worldwide computer within which access to resources and services can be negotiated as and when needed". From: Giang Nguyen [nguyen59@illinois.edu] Sent: Tuesday, April 27, 2010 11:56 AM To: Gupta, Indranil Subject: 525 review 04/27 CS525 review nguyen59 04/27/10 On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing Two popular approaches to distributed computing is peer-to-peer and grid computing. This paper compares and contrasts these two approaches, based on characteristics of deployed systems instead of unverified claims in the literature. Grids audiences are professional communities, willing to devote significant resources into building and operating the grid. Initially targeted at the scientific communities, commercial interest is growing. There is a degree of trust in users of grids. P2P systems are popularized by grass-roots, mass-culture applications such as file sharing, used by typically anonymous users. Trust has to be built into the system. Grid systems integrate more powerful and more connected resources that provide consistent and better availability and quality of service. Grid applications are considerably diverse and tend to be far more data intensive. P2P systems are vertically integrated solutions to specialized resource-sharing problems, CPU or files. Diversification comes from differing design goals, such as scalability, anonymity, or availability. Grids have smaller number of users but with higher activity. Grids typically use the Globus toolkit and there is significant effort on standardizing grid protocols and interfaces to enable interoperability between different grid deployments. P2P systems focus on integrating smaller simple resources like home computers and provide vertically integrated functionality. The paper conclude that both Grids and P2P systems share a common goal: pooling and coordinated uses of resources within distributed communities, are constructed independently of institutional structures. The two types of system will converge: grid computing will grow in scale, self-configuration ability, and fault-tolerance, which P2P systems do well, and P2P systems will increasingly use standardized tools and services as well as provide! m! ore powerful resources. Pros: - Have concrete examples to illustrate the differences between characteristics of grids and P2P systems. Cons: - In the case of P2P systems, have no concrete proposals for which particular area will have the highest impact if standardized. From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Tuesday, April 27, 2010 11:32 AM To: Gupta, Indranil Subject: 525 review 04/27 CS525 reviewed on Stale or Vintage? Liangliang Cao (cao4@illinois.edu ) April 27, 2010 Paper 1: A comparison of approaches to large-scale data analysis , A. Pavlo et al, ACM SIGMOD 2009. This is a missed-target paper: Map-Reduce is not designed for efficiency purpose. Map-Reduce is mainly hosted on computers with diversified quality and is scalable to large systems. Sophisticate DBMS systems, are more expensive and usually employed on good computers or high-performance clusters. It is not surprising that DBMS has better performance than Map-Reduce. Good insights: · It is interesting to view of DBMS procedure in contrast to MapReduce: Command is processed in a parallel DBMS consists of three phases. (1) filter sub-query is first performed in parallel at these sites similar to the filtering performed in a Map function. Following this step, one of two common, (2) replicate (for small number of data) or redistribute (large number) the data, which, is similar to the processing that occurs between the Map and the Reduce functions. (3) roll-up to aggregate the result. (similar with Reduce step). · The analysis of architectural trade-offs are insightful: Parallel DBMSs require data to fit into the relational paradigm of rows and columns. when no sharing is anticipated, the MR paradigm is quite flexible. If sharing is needed, however, it might be advantageous for the programmer to use a data description language and factor schema definitions and integrity constraints out of application programs. · The observation that MapReduce does not have good indexing scheme is really a good catch. · The criticism on execution strategy is reasonable: when a large number of Reduce instances running simultaneously, it is inevitable that two or more Reduce instances will attempt to read their input files from the same map node simultaneously, inducing large numbers of disk seeks and slowing the effective disk transfer rate. Cons and discussions · As we said, it is a missed-target paper. Performance is not the first concern of MapReduce. Other issues such as robustness, scalability are of more interests. · This paper might motivate a new problem: since including DBMS and MapReduce, parallel distributed large-scale data processing systems include many types of designs. Will there be one final winner? It seems to me that specialized database systems will win in specified domains. For example, some high dimensional data should call for specialized design, social media cite might also develop their own tool. It seems that MapReduce is fit for the requirement of current searching engines, but not necessary the unique winner for future cloud computing. From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Tuesday, April 27, 2010 10:48 AM To: Gupta, Indranil Subject: 525 review 04/27 On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing --------------------------------------------------------------------------------------------- Summary ------------ In this paper, the authors present the differences and commonalities between grid computing and peer to peer computing. The authors point out that both these trends of computing emerged from a need to share resources among a set of users. However, these trends have made advances in orthogonal directions peer to peer systems address scale and failure, but suffer from lack of infrastructure. Grid computing, on the other hand, guarantee infrastructure but is yet to achieve scalability and failure tolerance. Grid Computing is targeted towards scientific and professional communities that need resources for large scale simulations and data analyses. As such, Grid computing has organized and high end resources to its disposal. Grid computing applications also tend to be more data intensive. Besides, a significant effort has been invested in standardizing protocols used to access and use resources. However, there is not much scope of utilizing such resources in the absence of trust. On the other hand, peer to peer systems evolved due to the need for mass internet users to share files and other resources. As such, these systems scale to hundreds, thousands and sometimes millions of users and are highly fault tolerant. However, such systems depend on the resources given access to by the users and thus, infrastructure is lacking. Malicious users may misrepresent information, giving rise to further decline in available infrastructure. In addition, not much effort has been put into standardizing the protocols and configurations used across different peer to peer systems. As is evident, the achievements of peer to peer systems are still only goals of grid computing and vice versa. The authors, hence, point out that the two computing trends can in fact benefit from each other. Thus new research directions should aim to converge the two systems instead of carrying out independent research. Pros ------ -- Perhaps for the first time in literature, the authors point out how grid computing and peer to peer computing have emerged out of slightly differing requirements but have common goals. The differences that exist between the two trends are more due to the orthogonal directions in which research and development efforts have been invested in the two systems. However, the two systems can benefit from each other when integrated into one system. In this way, the authors point out a new and beneficial research direction. Cons --------- -- The kind of applications that both are targeted towards may be a hindrance for the purpose of integration. For example, peer to peer applications will demand anonymity whereas grid computing application would want authorization. Again, a simulation to recognize a possible earthquake will almost always be given more priority to a mere music file sharing application. Such conflicting interests may prohibit the two communities from converging. Thanks, Kurchi Subhra Hazra Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.illinois.edu/homes/hazra1/ From: Nathan Dautenhahn [dautenh1@illinois.edu] Sent: Tuesday, April 27, 2010 12:29 AM To: indy@cs.uiuc.edu Subject: 525 review 04/27 ########################################################## CS525 Reviews -- 4.27.10 -- Nathan Dautenhahn ########################################################## Title: A Comparison of Approaches to Large-Scale Data Analysis ================================================== Authors: Pavlo, Paulson, Rasin, Abadi, DeWitt Madden, Stonebraker 1. Summary and Overview This paper provides an analysis of the differences between the MapReduce framework and Parallel DBMS (PDMBS). The authors argue that the database community has provided a better option over MapReduce. They perform an extensive set of benchmarks focused on identifying the upsides of both mechanisms, and ultimately determine that DMBSes are better. 2. Contributions This paper provides a thorough analysis of both DBMSes and MapReduce. The authors execute a thorough evaluation of each system comparing essential performance characteristics, thus identifying the detailed strengths and weaknesses of each approach. 3. Limitations This paper fails to clearly identify the problem domain. They claim that clusters of over 100 nodes are really not useful, but large data warehouses such as Google, Facebook, Yahoo!, and many others provide the counterexample to this argument. I think this is in fact the primary domain where MapReduce is most useful, which is an area that the DBMSes perform very poorly (as the authors have provided evidence for). 4. Comments and Questions - How can we compare something such as Hadoop to a system that has had over thirty years to mature? Simple things such as special load balancers and data distribution/partitioning schemes are still in development for Hadoop, therefore this paper pits a mature system verses a very new one. - The authors did a poor job of clearly identifying the key points to their argument. It appears as though they want people to realize that DBMSes are a great option and should be selected, but they provide too many arguments without proof about their approach being better. It would have been better if they framed their work by identifying the areas and applications where the DBMS approach is better. - The authors mention how DBMSes require a schema which helps out there system to be better, but who says that a schema and strict data format is necessary to get good performance? --<(nathan dautenhahn)>-- From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Monday, April 26, 2010 11:45 PM To: Gupta, Indranil Subject: 525 review 04/27 On death, taxes and the convergence of peer-to-peer and grid computing This paper presents a short but interesting take on how P2P and Grid computing can be compared and contrasted. They observe that even though these systems currently seem to focus on different aspects but, a major underlying objective of these systems remains common to both: A coordinated use of a myriad of large sets of distributed resources which are widespread both in time and space. Grid computing mainly originated with the aim of connecting a small number of sites, mainly scientific professionals, with some level of trust amongst them, and who wanted to access resources remotely or share the resources of storage/computing to provide sophisticated applications and services and to provide a platform to experiment with new research ideas. P2P systems on the other hand do not claim to be very research-focused, nor do they support or provide highly sophisticated or complex applications and services. The trust element in P2P systems is minimal but they focus on ‘popular’ rather than ‘sophisticated’ services. Also, because of the trust aspect, some form of penalty to misbehaving elements maybe possible in Grid computing scenario but the same is quite intractable for P2P systems. Also the magnitude of a Grid resource and a P2P resource are very different. A grid resource might be a cluster or even if it’s a desktop computer, it would more commonly be from a research organization and will tend to be more powerful than the resources shared in P2P systems. Also, grids support a highly diverse set of applications whereas P2P systems revolve around simple resource sharing. Also different is the scale and amount of activity that grid systems and P2P systems encounter. What the authors argue is that both Grid and P2P computing have the same vision. Also, as time passes, these two would evolve such that they converge to a common point. They back this claim by observing how Grid systems are increasing in scale and more and more commercial participation is also being seen. This would lead to a system with similar kind of heterogeneity and trust issues that P2P systems experience. P2P systems on the other hand have now become quite mature with providing simple file sharing services etc and are now going for more sophisticated services and infrastructure. From these trajectories, it does seem like Grid and P2P may reach a point of intersection, however when this happens remains to be seen. The paper makes an interesting observation, but that’s about all it offers. The idea they present may possibly be known as a pioneering and visionary idea sometime in the future, but for now much more has to happen for it to materialize and come to the point where both grid researchers and P2P developers share common interests and work towards common goals. From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Monday, April 26, 2010 9:37 PM To: Gupta, Indranil Subject: 525 Review 04/27 Review of “On death, taxes and the convergence of peer-to-peer and grid computing”: This paper tries to compare and contrast two seemingly different concepts in distributed systems, namely p2p computing and Grid computing. The paper argues that while both of them were developed quite independently, they have several things in common: both are concerned with resource sharing, both use overlays and they complement each other; while p2p computing addresses scale and failures, grid computing is infrastructure based without any widely deployed mechanism to address failure and scaling to large number of resources. The paper goes on to point out several important differences between grids and p2p networks. While grids have been developed primary for use in scientific computing where processing large-scale simulations run with large of data and require huge computing power, p2p networks are on a lighter side and concerned with more “common uses” such as sharing music, videos etc. Grid networks typically have large dedicated resources that are monitored continuously while p2p networks consist of users who join them intermittently and are quite variable in their behavior. Further, grid computing networks are typically infrastructure oriented (small groups of people coordinate) and limited to smaller communities of the order of a few thousands providing generic services while p2p networks often consist of a large number of (millions) hosts which join/leave and function autonomously and often function in an application specific manner. The authors argue that in spite of all these differences, both p2p and grid networks can achieve their complete potential of serving as a framework for resource sharing only if they can adopt methods used in the other: the most important being grid computing not being able to address failures and p2p computing not being able to address infrastructure i.e. a persistent information pool that can be used without high reliability. Comments: - While the paper presents a complete list of differences between p2p networks and grid computing, it does not provide strong arguments for the fusion of the directions of computing. Although, both grid and p2p computing are concerned with sharing of resources, both of them have several differences as pointed out in the paper and integrating them is not as straight forward as the paper seems to suggest. - One major difference between grids and p2p networks is that the former consists of dedicated systems while the latter consist of “home-users” who share their hosts intermittently. This is a fundamental difference which dictates how the respective forms of computing must operate. While, developing a generic framework that encompasses both is quite challenging, it is not clear (even from this paper) whether this is achievable or not. - The paper argues that both grid and p2p computing are currently being limited by factors (scale and failures) that have not been addressed in them but are the basis of the other. While this is definitely true, the paper does not provide any directions which can allow the adoption of techniques used in one of these into the other. It would be interesting to see how much can the techniques adopted in one be used in the other esp. because they have developed with different requirements and diverse uses. - Another interesting question is that even if we have a generic framework that can fit into both these paradigms, would it be as efficient as those which have been specifically designed for them separately. This again brings into question the end-to-end argument. From: Vivek [vivek112@gmail.com] Sent: Monday, April 26, 2010 7:55 PM To: indy@cs.uiuc.edu Subject: 525 Review 04/26 A Comparison of Approaches to Large-scale Data Analysis, (Pavlo, Paulson, et al) Summary: This paper evaluates and compares two very different methods for large-scale data analysis: parallel databases and MapReduce Hadoop. In both of these cases, these systems assume a "shared nothing" setup -- no data is shared across nodes. Through their experimental studies, they illuminate some very important tradeoffs of these two systems. Through this work, they suggest how future can improve their performance as well as fault-tolerance. Pros: -The paper gave a very thorough experimentation with three different systems. They were each tested on the same machines, and great care was taken to maintain a standard consisnent benchmark test for each of the three different systems. - The paper breaks down each component of MR and parallel databases and does a cross-comparison between the similar components. The comparison is not only useful to understand, but the simple fact of associating the components in MR and parallel databases is also very useful. - Both the performance aspects and software aspects were compared in this paper. Their results showed how the setup for Hadoop is much easier than the setup for the parallel databases. DBMS-X required additional support phone calls in order to get it set up. Cons: - The tests are done for only 1, 25, 50 , 100 nodes. It's not exactly clear why only these numbers of nodes were tested. The trends might be more clear if they did tests with 15, 20, 75, 85, 90, 100 etc . nodes. - This was a very thorough experimental study. However, with these tests, a thorough "rules of thumb" or "lessons learned" section would be very helpful. It's clear that the two methods of data analysis are converging -- particularly with the advent of Pig for Hadoop, and more advanced functionality for parallel databases that supports reliability. But what would have been more useful here is the tips in implementation. For what applications would you use a parallel database? For what applications would you use MR? The results may seem to tell this, but it would make the paper stronger if they explained how they apply what they learned for an example real-world app. From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Monday, April 26, 2010 7:32 PM To: Gupta, Indranil Subject: 525 review 04/27 ===================================================================== On Death, Taxes, and the Convergence of P2P and Grid Computing ===================================================================== Cloud computing, without any doubt, is the current hype in distributed system area. However, just only few years ago, that hype was on grid computing and P2P. The objectives of grid computing and P2P are pretty much same (the pooling and coordinated use of large sets of distributed resources). Keeping that in mind, in the paper titled "On Death, Taxes, and the Convergence of P2P and Grid Computing", the authors tried to find out what the major differences between these two approaches. The basis of such comparison was target communities, resources, scales, applications, and technologies. Here are some major points that are essential to differentiate P2P and grid computing. Grid computing covers a significant number of complex applications and computation models while P2P is mainly applicable for few applications such as file sharing, content distribution, and so on. So, while we consider applicability, grid is surely the winner. On the contrary, grid is only scalable for tens of institutions and thousands of users, while P2P surely has millions of users. So, from scalability perspective, P2P defeats Grid. Grid follows some standard protocols and maintains robust infrastructure, while P2P lacks of those. So, in that case, Grid is again the winner. On the contrary, from coolness factor point of view, undoubtedly, P2P is the winner with high margin. So, it is hard to say, which is better, since it is largely context specific. Considering the above comparisons, at the end, the authors came to some interesting conclusions such as: (1) both grid and P2P address the same problem (2) both take same general approach to address the problem (3) both has great technical advancement, with crucial limitation (grid provides infrastructure but doesn’t address failure while p2p just does the opposite) and (4) complementary nature of strengths and weaknesses of these two approaches show that both of these communities are likely to grow in future. Pros: 1. This paper nicely shows the similarities and dissimilarities between P2P and Grid Computing. Cons / Discussion Points: 1. Is cloud computing the friend or foe of grid computing and P2P? 2. Which concepts of grid computing we can apply in P2P? 3. Which concepts of P2P we can apply in grid computing? 4. Which concepts of grid and P2P we can apply in cloud computing? From: Ghazale Hosseinabadi [gh.hosseinabadi@gmail.com] Sent: Monday, April 26, 2010 6:32 PM To: Gupta, Indranil Subject: 525 review 04/27 On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing This paper addresses and compares the issues related to failure and infrastructure maintenance in peer-to-peer systems and Grids. Grids are environments in which a permanent service infrastructure is present. The main task of the infrastructure is to share the available resources among the distributed users. A virtual organization (VO) is defined as a set of individuals or institutions defined by sharing rules of the grid computing system. In this paper, the authors define peer-to-peer systems as applications that exploit resources available at the edges of the Internet without the presence of any central server. The main differences between grids and P2P systems are the number of users, resources available, applications, scalability, robustness to failures and services. (more resources are available in grids). Users of P2P systems are usually much more than users of Grid environments. Resources offered by Grid systems are usually more powerful, more diverse, and better connected than the P2P resources. In terms of applications, P2P systems provide specialized resources. The existing applications are only compute cycles and files. On the other hand, much more diverse set of applications are provided by grids. Number of users of Grid environment are moderate, while number of users of P2P systems are enormous, namely several millions. In terms of the amount of activity both systems might be the same in some cases, although this parameter is highly dependent on the application. Grids provide persistent, multipurpose and diverse infrastructure services such as authentication, authorization, discovery, resource access, data movement. On the other hand services offered by P2P systems are more simple and specific. Pros: In this paper, different aspects of Grids and P2P systems are compared. Cons: It is interesting to compare the two systems more analytically. Different parameters may be considered such as how the provided service scales as number of users increases, time of convergence in a specific application, time complexity or message complexity in detection or recovery from failures, … . The two systems are not compared in security issues and the way that malicious attacks are handled. From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Monday, April 26, 2010 5:01 PM To: Gupta, Indranil Subject: 525 review 04/29 525 review 04/29 Hyun Duk Kim (hkim277) * Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing Journal 2002 This paper shows macroscopic analysis and experiments of peer-to-peer (P2P) system, Gnutella. Authors developed a crawler which joins the network and collects data like network topology and traffic information. From the measurement and analysis, authors found followings. First, although Gnutella is not power-law network, actual Gnutella system showed power-law distribution. Second, they showed a method to measure traffic which can be applied any P2P system analysis. Third, the virtual network topology of Gnutella does not match underlying Internet topology, and it increases costs. This paper shows interesting findings about P2P system and findings leads to future direction for better systems. Unlike its original intention, Gnutella showed power-law distribution, and it makes structure more vulnerable from attack. The difference between virtual structure and what Internet Service provides expected increases costs. These motivate more structured way of fault tolerance and routing mechanism. If authors show same experiment results with structured P2P system, it would be better. At the end of the paper, authors mention structured P2P system like CAN or Tapestry. Authors think they can solve some issues Gnutella has, but does not show concrete experiments. If we can compare analysis results with them, it would be clearer. Moreover, with experiment results with existing structured, we may be able to find other problems and suggest next generation structured P2P system. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Monday, April 26, 2010 3:32 PM To: Gupta, Indranil Subject: 525 review 04/27 525 review 04/27 Hyun Duk Kim (hkim277) * On death, taxes and the convergence of peer-to-peer and grid computing, I. Foster et al, IPTPS 2003 This paper compares Peer-to-Peer (P2P) with Grid computing. Both systems have similar general problems and use similar approaches to solve the problem. They made their own technical advances, but each also has similarities. Authors compare P2P and cloud computing system. They explain general similarities and show differences on five perspectives (target communities and incentives, resources, applications, scale and failure, services and infrastructure). By showing these comparisons, authors show their converging and complementary trends. This paper raises an interesting question and has good discussion about it. Authors pointed out the converging trend between P2P and cloud. Also, authors explain similarities and differences between P2P and cloud with easy to follow discussions. Authors use a fun analogy at the beginning, death and taxes. Death (failure) and taxes (establishment and maintenance of infrastructure) are important issues in distributed system. I think P2P and cloud have current differences because of their different initial purposes. The main purpose of P2P was file sharing, and that of cloud computing was big computation power. From each motivation, they developed separate systems. The initial motivation made differences. After they succeeded their own initial specialties, people from other areas tried to adopt the technology. Also, as P2P and cloud grow and try to be more versatile, they needed techniques from other areas. Now, they share many techniques, so they look like they are converging. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Monday, April 26, 2010 1:53 PM To: Gupta, Indranil Subject: 525 review 04/27 Paper 1: On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing This paper studies and contrasts Grid computing and Peer-to-Peer computing from different angles. The authors state that regarding the trends of these two distributed computing approaches, a comprehensive computing paradigm is needed that satisfies the requirements of both approaches and benefits from appropriate features of both as well. They compare the two approaches in different aspects: target communities and incentives, resources, applications, scale and failure, and services and infrastructures. Grid technologies were initially developed for addressing the needs of scientific collaborations, however, commercial interests is growing and requirements for an infrastructure providing some degree of trust, accountability, and opportunities for sanctions in response to inappropriate behavior arising. In contrast, communities underlying p2p technologies are mostly individuals with little incentive to do cooperatively. While grid systems integrate more powerful and diverse resources with better connectivity (such as a cluster, storage system, database, or scientific instrument of significant value), p2p systems often deal with intermittent participation and highly variable behavior. Applications of grid technologies vary significantly in the range and scope, depending on the target community, and they also tend to be far more data intensive. In comparison, p2p systems are more of vertically integrated solutions to specialized resource-sharing problems mainly file sharing, and not that data-intensive at all. P2p systems represent considerable scalability features and failure handling, as opposed to grid systems that involve only modest number of participants. And finally, while grid systems have expended much work on technical and organizational issues of providing persistent and multipurpose infrastructure services, p2p systems have focused on integrating simple resources to provide specific vertically integrated functionality. The idea convergence of grid computing and p2p computing is interesting and valuable, and they have looked at the issues from different aspects. It would be useful to include tradeoff analysis and see how much the value of such comprehensive and integrated system overweigh its cost. It is not clear how feasible it could be to have the entire valuable features of both technologies integrated in the same system. The idea might result in having some clients pay for some services that they really do not need. Regarding this issue as well as the performance cost related side, would the clients be at least as satisfied as before with the service? From: arod99@gmail.com on behalf of Wucherl Yoo [wyoo5@illinois.edu] Sent: Monday, April 26, 2010 11:59 AM To: Gupta, Indranil Subject: 525 Review 4/27 Old Wine: Stale or Vintage?, Wucherl Yoo (wyoo5) On death, taxes and the convergence of peer-to-peer and grid computing, I. Foster et al, IPTPS 2003 Summary: This paper compares P2P and grid and claims that two approaches have complementary natures. Both have same objective – to provide pooling and usage service for large sets of distributed resources. Both organize the resource sharing within virtual organizations (VOs) by creating an overlay structures for the services that need not correspond to the underlying organizations. The target communities of grid is usually professional communities such scientific collaborators and they are willing to create and operate the required infrastructures. Instead, the target communities of P2P are diverse and anonymous individuals and they have little incentive to participate in providing the services. Therefore, the resource availability of grid tends to higher and more uniform compared with intermittent and highly variable resource availability of P2P. The scalability of P2P (due to the decentralized mechanisms such as DHT and gossiping) is much better than grid since most grid service is highly centralized. The grid has standards-based service infrastructure that can be easily reused and the applications are mostly legal. However, P2P lacks of the infrastructure and the famous applications have been illegal. Pros: 1. Good summary and comparison of P2P and grid in Section 3. 2. Adapting mechanism of P2P to grid computing seems promising for scalability and self-stability from failures since the grid computing was not designed for Intenet-scale level. Cons: 1. This paper only points out possibilities of complementary adaptation and combination of P2P and grid without concrete ideas. 2. Adapting mechanism of grid to P2P seems promising such as building infrastructure since P2P still has difficulty to make users to fully cooperate and participate in the system. Due to the lack of willingness from the users and short duration of services like Churns, maintaining infrastructure like resource discovery services may be overkill without much profit. -Wucherl From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Saturday, April 17, 2010 12:47 AM To: Gupta, Indranil Subject: 525 review 04/27 A comparison of approaches to large-scale data analysis, A. Pavlo et al, ACM SIGMOD 2009 This paper compares the performance and ease of use of the MapReduce framework and traditional RDBMSes for data analysis tasks, benchmarking Hadoop's MapReduce implementation against two RDBMSes -- Vertica, as well as another, a "parallel SQL database from a major relational database vendor" (Oracle?). They show that on their (limited) benchmarks both the tested RDBMS implementations significantly outperform Hadoop on the same hardware when running relatively simple jobs, although with more complex jobs manual performance tuning from the user affects how well the job runs on any of the three systems (indeed, on their most complex, non-standard job, Hadoop actually outperforms one of the RDBMSes). They note, however, that Hadoop tends to always be better in terms of ease of usage, deployment and optimisation. Comments: * They do not adequately justify their basic thesis -- that RDBMSes can be used for the same tasks as MapReduce. The benchmarks they consider all deal with data analysis tasks (which traditional databases would understandably be good at), but MapReduce has also been used in, for instance, compute intensive scientific applications (examples can be found on the Hadoop website). * How much of Hadoop's inefficiency results from the authors' inexperience with the framework? They clearly seem to be experts on relational databases (indeed, some of them seem to have been involved in building the Vertica database they evaluate in this paper). * It is not clear why they chose to compare RDBMSes against MapReduce instead of against something like Pig Latin (they do mention Pig several times in the paper). * They do not discuss cost anywhere. They present examples showing that a 100-node cluster, composed of powerful servers, running a traditional RDBMS can perform as well as a 1000-node commodity Hadoop cluster, but the tradeoff is not clear since each individual commodity node should be much cheaper.