Lab 6 - Social Networks as Graphs

Peter Kairouz and Pramod Viswanath

In this lab, you will learn how to: (a) represent social networks as graphs, (b) visualize them using a Python package called NetworkX, and (c) spread information over a social network.

You will need to install NetworkX in order to complete this lab. Use the command: pip install networkx --user to install NetworkX on your machine. Many of you may already have NetworkX because it comes with the Anaconda installation of iPython. Click here for a good tutorial on NetowrkX.

After filling this notebook and running all the cells, rename the file lab6.ipynb to firstname_lastname_lab6.ipynb, include your well commented code, and submit it by email. Avoid unneeded steps/computations and make sure your code runs before submitting it. Grading is based on your submission which is due at 4 p.m. March 2, 20016. Your grade will be deducted 20 points for each day after the due date (late penalty).

Exercise 1: Representing Social Networks in Software (30 pts)

A social network is mathematically represented by a graph $$G = (V,E)$$ where $V$ represents the set of nodes and $E$ denotes the set of edges in the graph. A user is represented by a node $u \in V$ and a friendship between users $u$ and $v$ is represented by an edge $(u,v) \in E$. In other words, if users $u$ and $v$ are friends, then there is an edge between nodes $u$ and $v$ (i.e., $(u,v) \in E$).

In software, a graph $G$ is represented by an adjacency list A of size $|V|$, the number of nodes in $G$. The $u^{th}$ entry of A contains the list of nodes $u$ is connected to. Therefore, A is a list of lists. In this lab assignment, you are required to read a file which describes a social network and form its adjacency list. Each line in the file contains data of the form u v c d. This means that nodes $u$ and $v$ are connected to each other (i.e., for nodes $u,v \in V$ we have that $(u,v) \in E$). Ignore c and d as they do not contain any relevant information.

Write a function called build_adjacency that accepts filename, min_degree, and num_nodes as input parameters and returns adjacency as output. Start by declaring a list (of empty lists) of size num_nodes and then open the file filename which contains the social network. The social network data is stored in out.facebook-wosn-links which can be found on the course webpage. When reading the file out.facebook-wosn-links, consider the first num_nodes and ignore the rest. After forming the adjacency list for the first num_nodes nodes, ignore (delete) all the nodes that have less than min_degree friends. For example, assume node u has less than min_degree neighbors, then set adjacency[u] = [] and remove node u from its neighbors' lists. Finally, return the updated adjacency list.

In [2]:
def build_adjacency(filename, min_degree, num_nodes):
    
    adjacency = [[] for i in range(num_nodes)]
    
    # your code goes here
    
    return adjacency

Exercise 2: Visualizing Social Networks (20 pts)

You will need the following basic imports and function to draw graphs for you. You should know how to use draw_graph, but you don't really need to know how it works.

In [3]:
%matplotlib inline
from pylab import *
import random as rnd
import networkx as nx
from __future__ import division

rcParams['figure.figsize'] = 12, 12  # that's default image size for this interactive session

def draw_graph(graph, labels=None, graph_layout='shell',
               node_size=1600, node_color='blue', node_alpha=0.3,
               node_text_size=12,
               edge_color='blue', edge_alpha=0.3, edge_tickness=1,
               edge_text_pos=0.3,
               text_font='sans-serif'):
    """ 
    Based on: https://www.udacity.com/wiki/creating-network-graphs-with-python
    We describe a graph as a list enumerating all edges.
    Ex: graph = [(1,2), (2,3)] represents a graph with 2 edges - (node1 - node2) and (node2 - node3)
    """
    
    # create networkx graph
    G=nx.Graph()

    # add edges
    for edge in graph:
        G.add_edge(edge[0], edge[1])

    # these are different layouts for the network you may try
    # shell seems to work best
    if graph_layout == 'spring':
        graph_pos=nx.spring_layout(G)
    elif graph_layout == 'spectral':
        graph_pos=nx.spectral_layout(G)
    elif graph_layout == 'random':
        graph_pos=nx.random_layout(G)
    else:
        graph_pos=nx.shell_layout(G)

    # draw graph
    nx.draw_networkx_nodes(G,graph_pos,node_size=node_size, 
                           alpha=node_alpha, node_color=node_color)
    nx.draw_networkx_edges(G,graph_pos,width=edge_tickness,
                           alpha=edge_alpha,edge_color=edge_color)
    nx.draw_networkx_labels(G, graph_pos,font_size=node_text_size,
                            font_family=text_font)
    # show graph
    plt.show()
In [ ]:
# run this cell for a few graphs of your choice
graph = [(1,2),(2,3),(1,3)]
draw_graph(graph)

Write a function that accepts an adjancency list (called adjacency) and outputs a list of tuples (called graph).

In [38]:
def adjacency_to_graph(adjacency):
    graph = []
    
    # your code goes here 
    
    return graph

Tweak and run the code below to visualize a small social netowrk.

In [ ]:
adjacency = build_adjacency('C:\Users\kairouz2.UOFI\Dropbox\Big Data\out.facebook-wosn-links',4,50)
graph = adjacency_to_graph(adjacency)
draw_graph(graph)

Exercise 3: Spreading Rumors over Social Networks (50 pts)

Susceptible-Infected Model. The susceptible-infected (SI) model states that the rumor starts at node $v^* \in G$ at $n=0$. We refer to $v^*$ as the rumor source. At iteration $n=1$, $v^*$ can forward the rumor (uniformly at random) to one and only one of its neighbors. At iteration $n>1$, if node $v$ has the rumor, it can pass it to a node $u$ if and only if there is an edge between them. This process is repeated in discrete time steps (iterations) until we have a total of $N$ nodes with the rumor. If a node $v$ has the rumor, we say that node $v$ is infected. If node $v$ passes the rumor to node $u$, we say that node $v$ infects node $u$.

Implementing the SI Model in Python. Given a social network represented by adjacency and a fixed number N, use the SI model to infect N nodes starting from a randomly chosen rumor source. In order to do so in Python, you can create a list (of empty lists) called who_infected of N entries. This will store the adjacency list of the infected sub-graph. You will also need to define a list of size N that will keep track of the indices of the infected nodes in the original underlying graph. Call this list infected_nodes.

Here is how you can fill up who_infected and infected_nodes:

  • At iteration n = 0, choose a source node v* uniformly at random from {0, ..., num_nodes -1}. If adjacency[v*] = [] ignore the selected v* and redraw a new sample. Redo the aforementioned procedure (if needed) until you find a v* for which adjacency[v*] is not empty. Set infected_nodes[0]=v*.
  • At iteration n = 1, choose (uniformly at random) one of v*'s neighbors and infect it. Let's say the index of this infected node is v_1. Set infected_nodes[1]=v_1 and update who_infected as follows: who_infected[0]=[1] and who_infected[1]=[0]. Notice that the node indices v* and v_1 have been relabelled in the infected subgraph to become 0 and 1, respectively.
  • At iteration n > 1, consider the set of all susceptible nodes and infect one of them uniformly at random. The set of susceptible nodes: (a) contains the set of uninfected nodes that are connected to infected nodes: v*, v_1, ..., v_{n-1} (b) if an uninfected node is connected to $K$ infected nodes, then it must appear $K$ times in the list of susceptible nodes (each time for one of the infected nodes it's connected to). Let's say the index of the randomly chosen susceptible node is v_n and the index of the node that infected it is v_k. Set infected_nodes[n]=v_n and update who_infected as follows: append n to who_infected[k] and set who_infected[n]=[k].
  • Stop at iteration n = N-1.

Simple Example. To better understand the details of the SI model, let's consider the following simple example.

In [28]:
graph = [(0,1),(0,2),(0,3),(1,2),(1,5),(2,4)]
draw_graph(graph)

Let's assume that node 0 is the rumor source. So, infected_nodes[0] = 0 in this case. At n = 1, the susceptible list includes nodes {1, 2, 3}. Let's say you randomly select node 2 to be the second infected node. So, infected_nodes[1] = 2, who_infected[0] = 1, and who_infected[1] = 0. At n = 2, the susceptible list includes nodes {1, 3, 1, 4}. Notice that node 1 appears twice since it can be infected by node 0 (hence the first occurrence) and by node 2 (hence the second occurrence). Let's say you randomly select the second occurrence of node 1 to be the third infected node. So, infected_nodes[2] = 1, who_infected[1] = [0, 2], and who_infected[2] = 1. Make sure to remove the first occurrence of node 1 because you do not want to reinfect and already infected node at later iterations.

In [40]:
def generate_source(adjacency):
    # your code goes here
    return source

def si_model_rumor_spreading(source, adjancency, N):
    infected_nodes = [-1]*N;
    who_infected = [[] for i in range(N)]
    
    # adding the source node to the list of infected nodes
    infected_nodes[0] = source
    susceptible_nodes = adjacency[source]
    susceptible_indices = [0]*len(susceptible_nodes)
        
    for i in range(1,N):
        
        # your code goes here

    return who_infected, infected_nodes
In [ ]:
# run this script to test the above functions
N = 10
num_nodes = len(adjacency)
source = generate_source(adjacency)
who_infected, infected_nodes = si_model_rumor_spreading(source, adjacency, N)
graph = []
for i in xrange(N):
    for j in xrange(len(who_infected[i])):
        graph.append((infected_nodes[i], infctn_pattern[who_infected[i][j]]))
draw_graph(graph, node_color = 'red')