From: harshitha.menon SpamElide on behalf of Harshitha Menon Sent: Thursday, April 28, 2011 12:20 PM To: Gupta, Indranil Subject: 525 review 04/28 Exploring Complex Networks This paper explores various forms of network prevalent around us and studies the affect of their structure on their functionality. For eg: the topology of the social network affects the spread of information or diseases. The networks have various complexities which makes it difficult to comprehend. The paper reviews what is known about dynamical complexity in regular networks of nonlinear systems, the impact of network structure on collective dynamics. As a next step, it tackles networks that combine dynamical and structural complexities. Researchers have identified two properties of the network namely, short paths and high clustering. Hence dynamic systems networked this way would display enhanced signal propagation speed, synchronizability and computational power. Pros: -The paper explores various networks and it is very useful in understanding the nature of the network and how the structure affects the functionality. It also points out properties of the network which makes it more affective. -This paper does not restrict itself to P2P networks rather it explores various fields and analyses the network methodically -Paper provides models for the network being analysed. Cons: -There can be other properties that can be considered in a network like how does the network evolve when partitions occur, how does it handle failures of nodes etc. These can be explored in the context of distributed systems as they are important aspects of distributed systems. -The other aspect that could have been analysed is the scalability of the network. Mapping the Gnutella network Gnutella builds, at the application level, a virtual network with its own routing mechanisms. The topology of this virtual network and the routing mechanisms which has a significant influence on application properties such as performance, reliability, and scalability. The authors of the paper built a crawler to extract the topology of Gnutella’s application level network which they analyse and extract the network traffic information. The following are the two findings. Gnutella does not follow power law and Gnutella virtual topology does not match with the underlying topology. A common characteristics of P2P networks is that they build a virtual network over the underlying topology which decides various factors like communication cost, performance, reliability etc. Since in a P2P network, churn rate is high, these characteristics are dynamic, evolving over time and basically they are self-organized network of independent entities. Since the Gnutella’s virtual network doesn’t map on to the underlying topology, it is not making efficient use of network. Pros: -Their approach yielded a lot of information about the P2P system and its behavior. -Their measurement study could be deployed on other P2P network -They had suggested improvements upon the current Gnutella network From: Jason Croft Sent: Thursday, April 28, 2011 12:16 PM To: Gupta, Indranil Subject: 525 review 04/28 Scaling Properties of the Internet Graph This paper uses analysis and simulation to argue that congestion in the Internet scales poorly as its size grows. The authors focus on graphs with power law degree distribution and analyze both shortest path routing and policy routing (BGP) schemes. Three different traffic vectors are considered: any-2-any (a unit traffic between all nodes), leaf-2-leaf (consisting of stub nodes without customers, and carriers), and clout (where highly connected nodes connected to other highly connected nodes send larger amounts of traffic). Simulations are run on a graph based on real AS topologies, as well as a synthetically generated graph. The real AS graph, also called an accurately labeled real graph (ALR), contains at most 13000 nodes; the synthetically generated graph, or heuristically labeled synthetic graph (HLS) varies with up to 50000 nodes. Both the analytical results and simulation results show maximum congestion scales poorly in power law graphs. For shortest path routing with an any-2-any model, maximum congestion scales as n^2 for large values of n, or the worst possible rate of growth of congestion. The distribution of congestion also becomes uneven as the number of nodes increases. Leaf-2-leaf is about 0.8x any-2any, and the clout model scales worse than n^5. BGP scales better than shortest path, as n^5, since policies may move traffic away from hot-spots. Based on these results, the authors claim the uniform scaling in the link capacities according to Moore's law as the Internet grows will likely not be enough to handle increasing congestion. One solution they propose is the use of parallel links between nodes. While this could be a simple method to handle increasing congestion, it could be costly for highly connected ISPs that may have many potential hot-spots. Exploring Complex Networks In this paper, the difficulties in characterizing and understanding networks in different fields are discussed. The author argues networks are inherently difficult to understand because of their structural complexity, evolution, connection diversity, dynamical complexity, node diversity, and meta-complications where complications can influence each other. Networks of dynamical systems with regular topologies (e.g., grids, lattices, fully-connected) and their problems with stability and synchronization are examined. Biological examples, such as flashing fireflies, inspired a large amount of research in this area. The author then discusses more complex network architectures, but leaving aside the challenges of dynamics. Random graphs, though studied in pure math, has applications to ecosystems and the spread of diseases and computer viruses. Small-world networks fall in between the order of regular graphs and the randomness of random graphs. Graphs associated with search problems tend to have a small- world topology. Scale-free networks, with skewed distributions, are seen in the Internet backbone and the Web. Generalized random graphs, such as bipartite graphs, models boards of directors and their interlocking into one large governance. Aside from these, there is no further discussion of the six points/complexities he outlines earlier in the paper (e.g., node diversity or meta-complications). From: Long Kai Sent: Thursday, April 28, 2011 12:04 PM To: Gupta, Indranil Subject: CS 525 Reviews 04/28 Exploring complex networks Summary: This paper gives a review about the common properties and features of all kinds of network systems that scientists are exploring. Though the question is critical to applications and development of such systems, the problem is hard to understand because of Structural complexity, Network evolution, Connection diversity, Dynamical complexity, Node diversity and Meta-complication. This paper firstly reviews what is known about dynamical complexity in regular networks of nonlinear systems. The author offered a few rules of thumb about the impact of network structure on collective dynamics, especially for arrays of coupled limit-cycle oscillators. This article also uses graph theory to explore the structure of complex networks, an approach that has led to some encouraging progress, especially when combined with the tools of statistical mechanics and computer simulations. The article is inspiring review about the common features that computer network share with other network systems. However, there are a lot of differences between Internet and other networks. The fundamental reason is that the internet is manmade, and it is not evolved in a totally natural way. The links are not connected directly and the load is not distributed as evenly as others. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Summary: This paper observes two questions about Gnutella, its connectivity structure and how the Gnutella virtual network topology maps to the physical Internet infrastructure. Later on, this paper provides suggestions for improvements of Gnutella system based on the features they found. The authors have built a “crawler” to extract the topology of Gnutella’s application level network to study the topology and traffic of the system. They found that Gnutella node connectivity follows a multi-modal distribution, combining a power law and a quasi-constant distribution. Power-law structure is such systems that organize themselves so that most nodes have few links while a tiny number of nodes, called hubs, have a large number of links. Networks following this organizational pattern display an unexpected degree of robustness. As for the mismatch between Gnutella virtual topology and network infrastructure, the authors make the point that this mismatch cause unnecessary burden and stress on the network infrastructure. The author further talked about the potential improvement for Gnutella. The problem with this paper though is that matching the topology of Gnutella of infrastructure topology may not improve efficiency a lot, because the query distribution of Gnutella is very different from that of normal websites. The author assumes that distribution of Gnutella queries is similar to the distribution of HTTP requests in the Internet, which is not true. -- Larry From: Michael Ford Sent: Thursday, April 28, 2011 11:56 AM To: Gupta, Indranil Subject: 525 review 04/28 Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design The authors examine the architecture of the network built by the Gnutella system. Since peer-to-peer are essentially a virtual overlay and they use their own routing schemes, their performance is highly dependent on the ability to overlay in collaboration with the underlying topology. The authors build a distributed crawler using a client-server architecture to map Gnutella networks. This greatly reduces the amount of time required to crawl the network, leading to more consistent snapshots and higher scalability. Analysis shows that Gnutella peers have a high rate of churn, continued to grow in number, and generate a significant amount of traffic. Roughly one third of the traffic is user-generated, while the remainder is overhead; after an update in June 2001, the overhead was reduced to eight percent. The authors report that this is a significant portion of the US traffic, but they only give statistics before the June 2001 update, so it is not known if the trend will hold in the future. Node connectivity remains constant as the network grows, allowing for large scalability, but adding increased latency. Scaling Properties of the Internet Graph The Internet is continuously growing, and although link speeds also improve, there may be congestion bottlenecks. The authors consider BGP and shortest-path routing in their analysis and show that BGP routing does not improve congestion on the worst links as opposed to shortest path routing, though there are certainly other motivations for using BGP. They argue that the maximum congestion in the Internet scales poorly with the growing size of the Internet graph. To this end, the authors combine theoretical analysis with simulation. The simulations are seeded with Internet AS topology networks and synthesized networks. The size of these networks is questionable. The largest AS topology network contains only 13,000 nodes, and the largest synthesized network only 50,000. The previous paper, published a year earlier, noticed networks as large as 50,000 that only included Gnutella clients. It is unclear if their simulations can capture the hierarchical nature of the entire Internet with so few nodes. Regardless, they notice that the power law nature scales poorly on the most congested link. They present a solution of doubling the links between some nodes. The authors do not show that the maximum congestion is a good measure of overall performance, and their solution is specifically designed to counter this problem (by doubling the capacity of that edge). From: Nipun Sehrawat Sent: Thursday, April 28, 2011 11:55 AM To: Gupta, Indranil Subject: 525 review 04/28 Scaling Properties of the Internet Graph This paper does an interesting study of scalability of congestion in the internet as it grows in size. The worst congestion in the Internet was found to be growing at O(n^(1 + const)), which is a matter of concern, considering the high value of n (number of nodes in internet). Also, this behavior shows up in both shortest-path-first as well as policy-driven (BGP) routing. Such a congestion is expected in the ‘core’ links of internet, which can lead to performance degradation of large parts of the graph. Supplementing such hot-spot links with some redundant links will help in improving the congestion problem at these links. Alternatively, routing policies in place can be changed to acknowledge these hot-spots. The authors assume internet graph to be following power-law, where the number of nodes having a degree d is inversely proportional to d, which seems reasonable, but could have been backed by empirical findings. So, the problem is reduced to finding out congestion in graph following power-law and routing policies being shortest-path and policy-driven routing. Three type of traffic patterns are assumed - any2any (unit traffic between every pair of nodes), leaf2leaf (unit traffic between every pair of nodes that don’t have a customer (called stubs)) and clout (assumes stubs with high-degree are likely to send more traffic). Based on the simulations, it is observed that congestion at an edge grows with the degree of nodes on either ends of that edge. This seems reasonable, as such edges are expected to appear in the core of internet, such as between two boundary routers of ASes. Adding parallel links to supplement such links resulted in maximum relative congestion to grow linearly with the number of nodes, which seems as a big improvement. Pros: - Provides insights about growth of congestion in the internet graph, as number of end nodes grow. - Proposes a solution to contain congestion to a linear growth, by adding redundant links at key points in the internet graph. Cons: - Authors could have tried to back their power-law assumption about internet graph, by empirical findings. -- Exploring Complex Networks This article first motivates about the importance of networks and how the study of networks is relevant to diverse fields ranging from computer science to neurobiology. The author also argues that structure of networks plays an important role as it affects the functionality of the concerned entity. For example, topology of a social network affects the spread of an epidemic and how the topology of a power grid determines its robustness. Some of the difficulties involved in studying/understanding networks are: - Structural complexity: It becomes difficult to represent a complex structured network - Network evolution: It gets difficult to capture the dynamism of networks like the world wide web, where pages are added and removed continuously. - Connection diversity: Links between nodes might be diverse in type, direction, weights etc, for eg in a computer network. - Node diversity: A network may feature heterogeneous nodes, such as nodes being routers, bridges, PCs etc in a computer network. The article describes Random graphs, which are a collection of n nodes and m randomly generated links between these nodes. An important characteristic of such graphs is the connectivity of nodes as m grows. For m > n/2, the graph has a connected component having O(n) nodes and the closest rival to this chunk having o(logn) nodes. This kind of networks are seen in spread of computer viruses, where keeping m low results in containing the spread. Scale-free networks capture the fact that some nodes in a network are more connected than others and nodes follow the power-law. Such networks are resistant to random failures as the connectivity is mainly dominated by a a fewer hubs. However, the network is now open to attacks at these limited number of hub nodes. Such graphs represent the internet topology to a high degree. Pros: - The article presents the study of networks in totally different fields, computer science shall have a lot to borrow/collaborate from the study of networks in these fields. From: wzhou10 SpamElide Sent: Thursday, April 28, 2011 11:49 AM To: Gupta, Indranil Subject: CS525 Review 04/28 CS525 Review 04/28 Wenxuan Zhou Review on “Mapping the Gnutella Network: Properties of Large-Scale Peer-to- Peer Systems and Implications for System Design” Core idea The popularity of P2P systems, such like Gnutella network, makes quantitative evaluations on them important. This paper analyzes the results of two types of crawls, a sequential one and a distributed client/server one, on the Gnutella network and extracts some properties of it. Gnutella is designed with the goals: ability to operate in a dynamic environment, performance and Scalability, reliability, and anonymity. Following aspects of Gnutella are analyzed: growth trends and dynamic behavior, estimate of Gnutella generated traffic, connectivity and reliability in Gnutella, and mapping between Internet infrastructure and Gnutella. It draws the conclusions about Gnutella: 1) it generates a significant fraction of the total Internet traffic; 2) its node connectivity follows a multi-modal distribution; 3) its topology mismatches that of the Internet. Pros This paper provides some interesting statistics and results about the properties of the Gnutella network. Especially, it reveals that Gnutella is completely independent from the Internet structure. Cons 1. The crawls may change the dynamics of the Gnutella Network. It’d be nice to know how much effect the crawls made on the original Gnutella Network. Extremely case is it changes the properties of the network, which are the paper’s major interests, especially when crawls behave very differently from normal Gnutella clients, thus the results are somehow meaningless. 2. There’s some confusion also. It’s found that in Nov. 2000, only 1/3 traffic was useful user traffic, while the rest was for maintenance. Then, after June 2001, the problem was solved, with 92% traffic for user traffic. However, as said earlier no crawls run only after June 2001, so this data isn’t supported by their own experiment results. Also, the fact the authors failed to mention any details about the solution method other than “newer Gnutella implementations” makes it even less convincing 3. The mismatch between the topology of Gnutella and the Internet, according to the authors, can cause serious problems. However, the paper made no effort to mitigate this fact or make some suggestions. Review on “Scaling Properties of the Internet Graph” Core idea This authors discovered the worst congestion in Internet-like graphs scales poorly with the graph size, when shortest path routing is used, based on both analysis and simulations. Moreover, policy-based routing is found, surprisingly, performs no worse, in terms of worst congestion. To improve the Internet graph, the authors suggest to introduce parallel edges, which is proved to be an effective way. Pros The conclusions are supported both by theoretical proof and simulation results. Some interesting discoveries are presented. An effective way to improve internet graph scaling properties is proposed. Cons 1. In 3.1 the traffic on each edge is calculated without consideration of the capacities of each link. Moreover, there may be some load balancing mechanism could be taken into account too. 2. The fact motivating the clout model, “well placed” sources are likely to send larger amounts of traffic than other sources seems a little plausible. A better placement of a source can only means its ability to send larger traffic, but not its likelihood to do so. 3. The paper suggested adding parallel links according to simple functions of degrees to reduce congestion, and said “in this case it might not be necessary for some links in the graph to grow in capacity at a faster rate than the others”. But I fail to see the fundamental difference between these two approaches. 4. The entire paper is about worst congestion, but how much percentage this worst case takes up is not mentioned or considered. Policy-based routing and shortest path are compared based on worst congestion, which may not fair. Thanks. ~Wenxuan From: trowerm SpamElide on behalf of Matt Trower Sent: Thursday, April 28, 2011 10:53 AM To: indy SpamElide Subject: 525 review 04/28 Mapping the Gnutella Network The authors present in this paper a measurement study of the Gnutella P2P network application. Gnutella manages a virtual network topology through which queries and data are disseminated. This topology can be significantly different than the underlying physical topology which leads to inefficiencies in hops. Later P2P networks such as Pastry have addressed this issue. Gnutella has to maintain peer connectivity in this graph and does so using PING and PONG messages. The study showed that these messages produced the majority of the traffic present in Gnutella networks until a fix was released later. Other topics such as fault tolerance were also covered in the paper. This paper presents an interesting view into the Gnutella network which is one of the predominantly used networks. Many of the scalability issues discussed have now forced ISP's to change pricing models as predicted. Furthermore, new P2P systems such as Pastry have emerged to correct issues in network topology present in Gnutella. Scaling Properties of the Internet Graph This paper presents a theoretical study of the congestion in the Internet graph as the number of edge nodes grow. It has long been understood that the Internet graph grows in the number of nodes and capacity in the core at rates similar to those expressed by Moore's Law. This paper shows that while the outer edges of the network indeed grow at this rate, internal nodes in the graph are required to grow much faster to accommodate the new traffic. The second interesting finding is that this holds true whether SPF or policy based routing is used which is perhaps not what is initially expected. The authors suggest that the congestion in the network could be largely relieved by the use of a few parallel links between nodes which are indicated as being hotspots. Several strategies are presented for how these links should be selected which each provide similar benefits. This paper provides an important contribution to our view of the Internet's growth. Under current architectures, it cannot be assumed that the end-to-end speeds of the Internet will continue to grow as it has previously. This paper seems to promote the idea of trying to keep as much traffic as possible close to the edges and avoid the hotspots in the core of the network. From: kevin larson Sent: Thursday, April 28, 2011 10:35 AM To: Gupta, Indranil Subject: 525 review 04/28 Akella et al investigate the scaling properties of the Internet Graph. They characterize the Internet and look at the worst congestion, showing that it scales poorly at O(n^>1) with respect to increasing quantities of nodes. They model the scaling and congestion in respect to 3 models, any-2-any, leaf-2-leaf, and clout. Any-2-any assumes a unit traffic between every pair of nodes. Leaf-2-leaf assumes traffic between all the customer-provided leaf nodes in the graph. Clout attempts to best model the actual Internet graph, leveraging knowledge that high degree sources are likely to send larger quantities of traffic than other sources. They show the derivation of their formulas and simulate these three models with Inet-3.0. They then proceed to graph the maximum edge congestion in respect to the number of nodes for each of the three models. They show that while any-2-any and leaf-2-leaf show similar scaling, clout shows significantly higher congestion. They present policy based routing and show the improvements it provides for scaling with the cloud model and suggest the addition of parallel links on these high congestion links (and suspect that this is done in practice). The authors strongly motivate their clout model and explain the shortcomings of the other two models in respect to a true model of the Internet. They also did a good job of motivating the parallel link implementation in respect to the actual implementation. Evaluation was strong and motivated, and graphs were well explained. The authors presented both constant and dynamic values for alpha in the implementation of Inet-3.0, the exponent of the power law graph. They used the constant values for their experimental support and the dynamic values for their simulation results. This was confusing (to me), as it was not clear why the different usages of alpha were necessary. Additionally, they presented a graph of edge congestion vs the average degree, which had a hump at degree values of approximately 500. Insight into the cause of this would have been interesting, but was not provided. Ripeanu et al presented a mapping of the Gnutella network, and explored the growth and traffic. They presented other attempts to characterize Gnutella, and explained the shortcomings of these works. They summarized Gnuttelas goals in terms of dynamic configuration, performance/scalability, reliability, and anonymity and explained the ping, pong, query, response, get, and push messages. They explained how their crawler captures the state of Gnutella, and present growth trends and message behavior in respect to the the date. They also mapped the characteristics of the connections, exploring the shortest paths, and the relation between the number of nodes and the number of links (in respect to both the network as well as a single network topology). The relation between the number of nodes and links was interesting, as the number of nodes in a topology increased significantly between Nov 2000 and Mar 2001. They explored the infrastructure, and showed many of the inefficiencies in how nodes were connected. Finally, they suggested improvements to Gnutella in respect to their findings. The authors did a good job of motivating the improvements from previous characterizations that resulted from their techniques. They provided a lot of interesting data about how the Gnutella implementation and network changed and scaled over the period of their observations. They provided deep insight into the trends presented in their graphs, and explained these insights clearly and in detail.. Additionally, they found areas that seemed promising for improvements. The authors provided evaluation only a short interval of Guntella. Although, its hard to argue this years later, it would be very interesting to see how these behaviors have changed and improved since their work. From: Andrew Harris Sent: Thursday, April 28, 2011 9:44 AM To: Gupta, Indranil Subject: 525 review 04/28 Review of “Mapping the Gnutella Network” and “Exploring Complex Networks” In the Gnutella mapping project, the main goal was to determine a rough structure for that particular filesharing network. The researchers had in mind a supposed model - a power-law network - but had no empirical evidence to support this supposition. To explore this, they built a network crawler that, over a six month period, recorded various aspects of the network that would hint at its shape. These metrics included network joins and parts (churn), bandwidth utilization by the Gnutella protocol, and overall traffic volume. Their particular findings are curious; a preponderance of nodes enter and leave the network within about 4 hours, while a quarter remain alive for multiple days at a time. Traffic estimates for just the Gnutella protocol approach 330 TB/month, for a 50,000 node network, indicating issues of scalability with large networks. Locality of nodes is an aspect introduced as a method of improving Gnutella. Also introduced is the notion of a better query broadcasting mechanism, with CAN and Tapestry cited as methods of improving upon the Gnutella model. The Strogatz article provides some insight as to the observed structure of the Gnutella network, as its own self-organization is rather similar to other self-organizing networks in nature and elsewhere. The article makes the point that knowing how certain networks tend to be arranged can provide new perspectives as to their analysis. Complex networks, such as randomly arranged networks and small-world networks, emerge in nonlinear systems, and their study is important. Now, with social networking at an apex of popularity, it seems more critical than ever to better understand networks, and to better understand how to apply the right sort of network model to a given unknown structure. With social networking in mind, the transfer of information between persons follows similarly the patterns of information transfer observed between filesharing clients. Person A finds some interesting piece of information and makes it known that they have such to share; Persons B, C, etc., contact A for the information, an exchange is (potentially) made, and the data propagates across the network. Person N, connected to A and B through multiple intermediaries may send a query across their network, asking if anyone has heard of a particular topic - this would be a query flood in Gnutella terms. Persons A, etc., respond in kind, and a transfer is again made. Knowledge, like data, is easily replicated and shared between persons, so the same sort of analyses made on networks like Gnutella can be used on networks of live people as well (with caveats, of course). What of newly emergent network structures, such as BitTorrent swarms and computing clouds? The former, BitTorrent, is much more anarchistic, in that the only common point is the tracker (and even that tends to vary, and is no longer completely necessary). Thus, you end up with a network that is quite random, though highly efficient in passing along information. The latter, computation clouds, tends to be easier to enforce a structure upon, as the machines themselves tend to be human-managed rather than automatically arranged. This isn’t the case, though, as some allow for arbitrary network configurations, so to their clients the network may appear as complex as is desired for the application at hand. All of these in mind, though, underscore the overall importance of understanding the overlap of disciplines within the sciences. Networks have been modeled within mathematics, biology, sociology, media studies, computer science, and other disciplines; surely no one discipline can fully claim it as their own. Thus, network analysis is a truly multidisciplinary skill, deserving of consideration from multiple angles when found in literature and in research. -------------------- Andrew Harris PhD Student, Social Spaces Group Dept. of Computer Science University of Illinois at Urbana-Champaign harris78 SpamElide From: Curtis Wang Sent: Thursday, April 28, 2011 4:00 AM To: Gupta, Indranil Subject: 525 review 04/28 Curtis Wang (wang505) 4/28/11 Structure of Networks Exploring Complex Networks The review gives a very brief overview of nonlinear dynamics of systems and goes into deeper detail on three types of systems—small-world, scale-free, and generalized random connectivity. The small-world model lies somewhere between a completely ordered model and a completely random model. It takes a regular lattice and randomly replaces edges with a certain probability. The main observation is that the resulting graph has short paths between any two nodes. The scale-free network’s degree distribution follows a power law. The generalized random connectivity model permits more variation in the degree of nodes. The author briefly talks about speculation about how networks that come as a result of these three different types may have advantages—such as being more synchronizable or more fault-tolerant. Pros - General introduction into an interesting field - Motivates many relevant examples from several different fields - Very illustrative diagrams Cons - As a general overview, it seemed too short to be so technical. It could have been either longer with more details, or shorter without going into the mathematical details Scaling Properties of the Internet Graph The authors discuss the problem of scaling the Internet graph: new end-nodes are being added every day, which puts increased load on the links in the network. In addition, the load may grow faster for certain bottleneck links in the network. Through theoretical analysis, the authors show that the worst congestion of the Internet scales quite poorly with the network size and that policy-based routing is no worse than shortest-path routing. In addition, they offer a potential solution, showing that by adding parallel links in the existing network, the maximum relative congestion in the graph scales linearly instead. Pros - Addresses an important (and seemingly easy-to-overlook) problem that could have lots of impact on the future of the Internet - Theoretical analysis with supporting simulation results - Offers a potential solution—adding parallel links Cons - Could offer more possible solutions, maybe regarding routing policies or other ways to alter the graph structure Adding parallel links seems like it would be a very slow and costly process. Are there other methods of changing the graph structure that could be used? Is there analysis about which links should be reinforced first (with a parallel link)? From: Tengfei Mu Sent: Thursday, April 28, 2011 1:07 AM To: Gupta, Indranil Subject: 525 review 04/29 1. Exploring Complex Network This paper presents an overview of modeling of complex networks. The author start from illustrating how the network could be structured in real world by introducing examples in biology or sociology. Then the author pointed out different issues that make the study of networks complex, such as evolution, dynamic, structure, diversity of node, diversity of connection and meta-complication. Also the author analyzed the connection between different part within the network using mathematic way. In the second part, the author describes the networks that combine dynamic and structural complexity. Pro: 1. The authors have introduced and discussed different examples of complex networks, as well as the individual dynamics of the components and the coupling behavior. It is a good review paper overall. 2. I agree with the author’s conclusion that networking thinking will become essential to all branches of science in the long term. Con: 1. The authors only studied 3 aspects of a complex network. 2. Mapping the Gnutella network: properties of large-scale peer-to-peer systems and implications for system design This paper focuses on characterizing the connectivity issue and how it is related to the improvement of the system resilience within the Gnutella’s network. This paper first find that Gnutella network is not an exactly power-law network. It means that Gnutella network has the advantage of a power-law structure, most of which suffers from being vulnerable to Dos attacks. But Gnutella does not falls in this kind of structure. The second finding is that the logical topology does not match the underlying network structure within Gnutella network. It means Gnutella wastes too much time sending messages to nodes. Pro: 1. Different aspect of Gnutella network has been discussed, which affecting the scalability and the performance of the Internet, which could potentially inspire more future work to overcome those limitations. Con: 1. No possible solution for the problem discussed in the paper. From: david.m.lundgren SpamElide on behalf of David Lundgren Sent: Thursday, April 28, 2011 12:55 AM To: Gupta, Indranil Subject: 525 review 04/28 Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Ripeanu et al. explore Gnutella's virtual topology as well as the correspondence between Gnutella's virtual structure and the underlying physical Internet infrastructure. The authors characterize Gnutella as a ``semi-scale-free'' network, where the degree distribution seems to follow more or less a power law, but with a flattened head. The authors posit that the flattened head is the result of users being early technology adopters and having a different distribution of high-speed Internet access than what is typical of most network users. It is then demonstrated that the virtual topology is independent of the underlying physical network infrastructure. This independence along with a power-law-esque distribution make Gnutella robust to network outages and other more malicious failures. The authors recommend implementing a mechanism similar to proxy caching for HTTP requests on the assumption that Web requests and P2P requests follow similar distributions. I believe that this assumption is slightly incorrect as shown by Krishna et al. in 2003. More innovative protocols for query dissemination are suggested to reduce the overhead of the network. Pros, Cons, Comments, and Questions: - The authors measurement and analysis techniques seem rigorous and will most likely generalize to other P2P networks. - I thought it was quite interesting that the number of connections per node remained constant as network sized increased and that the average shortest hop is less than seven nodes, while most nodes are connected by two steps. - I think future work could (or may already have, given the paper's age) examine other salient network characteristics such as edge weight/frequency of contact, network diameter, clustering coefficient, etc. - I did not understand the reason for only sampling traffic along a single edge. I think more samples would have improved my confidence in this measurement. - I am also not convinced of the early adopter for the flattened head of Gnutella's degree distribution. I also am not sure the authors presented enough evidence to claim that all graph statistics are preserved in their snapshots. ------------------------------------------------------------------------- Exploring Complex Networks In his review article published in Nature, Strogatz proceeds to explain the mechanics and certain unique characteristics of a variety of graph families. He begins with regular networks and discusses our understanding of coupled dynamical systems of oscillators. More complex network structures are then discussed, such as random graphs, scale-free networks (graphs with power law degree distributions), and generalized random graphs. Small-world networks are examined as a middle ground between regular networks and more random architectures. Lattice networks with links replaced with random ones probabilistically are shown to accurately model many real world graphs such as biological networks. Graphs generated in this manner often have a single, massive connected component and few ``dominant'' central hubs. Scale-free networks or power law networks form a family of graphs extremely resilient to random node failures, but brittle to targeted attacks at key hub nodes. Real examples include Web page link networks. Finally generating functions are introduced for generalized random graphs of arbitrary degree distributions. Examples of bipartite corporate, academic, and celebrity graphs are generated using this manner. The clustering coefficient of these models is compared to that of empirical networks. The model corporate model is shown to be an ideal fit while the academic and celebrity networks are slightly off. Pros, Cons, Comments, and Questions: - Overall I thought this was an excellent review article at the nascent stage of network analysis popularity in computer science. It would be interesting to see an updated version of this paper discussing more recent work given the explosion of Twitter, Facebook, and other social media. ``Computational social science'' is an emerging and fascinating field. - The mathematical framework for constructing general random graphs seems extremely potent and applicable to research in distributed systems. - I would like to see an overview of theoretical models of network structure focusing purely on distributed systems and other networks related to computer science. I also think that there are many more advanced metrics for graph analysis that would be well suited for an additional survey article. From: Agarwal, Rachit Sent: Wednesday, April 27, 2011 9:25 PM To: Gupta, Indranil Subject: 525 review 04/28 Rachit Agarwal ---- Scaling Properties of the Internet graph The paper argues that the worst-case congestion in the Internet scales poorly with the network size. Using analytical and empirical arguments, the paper also shows that in the realistic case of using policy-induced routing, the worst-case congestion has similar behavior as that with shortest-path routing. My thoughts/comments on the paper: 1. Such results typically depend on traffic patterns in the Internet. While it is a good start to understand the congestion using simple traffic patterns, such patters could be very unrealistic in practice. It would have been more interesting to see analytical results for a more general class of traffic patters, which may be a difficult task to achieve. 2. The notion of policy routing used in the paper is very loose. Typically, ASes do not *just* use valley-free routing and have a more sophisticated local preferences over the set of valley-free paths (even if we choose to forget about other policy settings). It would have been interesting to see some discussion on how these results carry over to the more general case. 3. The analytical case seems a bit tricky. In particular, is the analysis modeling an AS-level topology? If no, how is the core of the network defined? If my understanding is right, each router in the core of the network does not have to have a very large degree. If yes, then the analysis should have considered the fact that the AS-level topology is already a multigraph with multiple edges between any two ASes (pairs of ASes may have different point of connections). ---- Exploring Complex Networks An interesting read that surveys results on how structural properties of graphs/networks play a role in functionality of these networks. The focus is, in general, on how several key ideas apply to the general understanding of a wide variety of networks. The author argues that among various aspects of these networks, some are easy to model and leaves exploring the other aspects as major open problems in the field. Focusing on the more interesting case of data networks, some comments: 1. Modeling realistic data networks simply based on their structure may lead to very counter-intuitive (and possibly incorrect?) conclusions. For instance, the author cites a work that argues the following - "any node [in scale-free networks] that fails probably has small degree". While this may be true for failures due to device aging, in practice the failures in data networks are often due to the "load" on the device which is usually higher on nodes with high degree. How would one model such solely based on structure of the underlying graph? 2. Among the various complications discussed by the author, I found connection diversity, dynamical complexity and meta-complication as the most interesting ones. Too bad that it turns out that we do (?) not have good tools to model these complications. Does that mean that we should fall back to simulations and empirical approaches? Can we make realistic assumptions while not completely ignoring these complications in our analysis? 3. I am curious if structural properties can be used to describe the effect of correlated failures and load-induced failures? ---- -- Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Web: http://agarwa16.wikidot.com ---- Sing as if No one is Listening, Dance as if No one is Watching, Dream as if You'll Live Forever, Live as if You'll Die Today !! From: Ankit Singla Sent: Wednesday, April 27, 2011 4:12 PM To: Gupta, Indranil Subject: 525 Review 04/28 1. Scaling Properties of the Internet Graph ----------------------------------------------------------------- Summary: The paper attempts to study the impact of increasing number of nodes in the Internet AS graph on the congestion in the network. It looks at shortest path routing as well as "valley-free" policy compliant routing on the AS-graph as well as synthetic graphs. The authors analytically prove that the maximum congestion scales as ?(n1 + 1/?). Through simulations, they show that policy-compliant routing doesn't worsen this significantly. They conclude by noting that adding parallel edges between high-degree nodes seems to solve the congestion scaling problem. Comments: The preferential attachment model may not really be appropriate for network evolution in this setting - ISPs are likely to look at the traffic they see to decide the best connectivity augmentations to make. Likewise, the unit-flow between each node-pair is far from ideal in this setting - traffic can be expected to be fairly skewed instead of uniform. The "clout" model seems insufficient because the destination's degree should also matter; if at all degrees are indicators. The result on congestion being relatively independent of the policy restrictions is fairly surprising. The notions of cause and effect are somewhat mixed-up here: ISP contracts probably factor congestion into them anyway, augmenting connectivity as required. Possibly the Internet graph already has enough edge multiplicity to avoid poor congestion scaling. ISPs probably look at existing traffic conditions rather than simplistic heuristics like node degree to decide where to add connectivity. 2. Exploring Complex Networks -------------------------------------------------- Summary: The paper discusses complex networks - random graphs, small-world phenomena and scale-free graphs included. It describes very intuitively the properties of these networks and to some extent, the rationale for how these properties come about. In some ways, it is a survey of results as well as applications of graph models and techniques. Comments: It's strange how some parts of the article are watered down to avoid detail, and how in some parts entire formulae and equations are thrown in. The note towards the end of the article about using the theoretical graph models to separate properties that are naturally occurring across different settings just because of the somewhat universal characteristics of connectivity, from features that are special to a particular setting is a nice observation. Ankit From: Qingxi Li Sent: Wednesday, April 27, 2011 2:15 PM To: Gupta, Indranil Subject: 525 review 04/28 Exploring complex networks This paper introduces two mathematical models to analyze the network structure which are regular networks of dynamical system and the random graph. Besides this, the author also mentioned the reasons why the structure of the networks is difficult to be analyzed, like the structural complexity, network evolution, connection diversity, nodes diversity and so on. However for the network evolution the author argued that the links are created and lost every minute, in fact, the physical links in between the networks or in each network both are not changed frequently. And the connection between two nodes created or lost will not affect the underlying links. Besides this, the diversity of the nodes will also not affect the structure of the networks as the bandwidth of the link have already determined by the type of the links, no matter the node uses this link is supercomputer or laptop, the bandwidth assigned to this node will not be different. Additionally, the real networks are something between the random graph and the dynamic system. Compared with the random graph, there still some principle can be used to module the structure of the networks, but it’s not so regular that we can put it into a module. So compared with creating a theoretical module to analyze the complex structure of the networks, directly evaluate and measure the structure of the networks will be more useful as the structure of the networks will not be changed too frequently to be measured. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design This paper analyzes the structure and traffic of the Gnutella. It uses the distributed crawler to get a snapshot the structure in Gnutella, but there are two problems. Firstly, even the author didn’t mention in the paper, but it seems like the author used IP to separate the different nodes. However, as some nodes may use DHCP to get IP address, so this may underestimate the average alive time of the nodes. As the other paper we read in this semester, this underestimation may be around 25% to 20% times of the original ones. The other problem also mentioned by the authors, the crawling time of this method is around couple of hours, compared with the living time of the nodes, 40% is less than 4 hours and only 25% more than 24 hours, which is too long. The result is interesting. After the Gnutella changing the protocol, the membership messages decreased from 55% to 8%. And even the authors think there is too few nodes to form the power-law structure, however, from the figure 5&6, we can find that compared with the number of the nodes which has large degree, the most nodes still have a small degree. The reason why the number of the nodes with degree less than 10 becomes small just because the increasing of the total number of nodes or may also because of the changing of the protocol. The other result found by this paper is Gnutella not efficiently uses the underlying network structure. This may just because Gnutella is built on the overlay network.