In this lab, you will learn how to: (a) represent social networks as graphs, (b) visualize them using a Python package called NetworkX, and (c) spread information over a social network.
You will need to install NetworkX in order to complete this lab. Use the command: pip install networkx --user
to install NetworkX on your machine. Many of you may already have NetworkX because it comes with the Anaconda installation of iPython. Click here for a good tutorial on NetowrkX.
After filling this notebook and running all the cells, rename the file lab6.ipynb to firstname_lastname_lab6.ipynb, include your well commented code, and submit it by email. Avoid unneeded steps/computations and make sure your code runs before submitting it. Grading is based on your submission which is due at 4 p.m. March 2, 20016. Your grade will be deducted 20 points for each day after the due date (late penalty).
A social network is mathematically represented by a graph $$G = (V,E)$$ where $V$ represents the set of nodes and $E$ denotes the set of edges in the graph. A user is represented by a node $u \in V$ and a friendship between users $u$ and $v$ is represented by an edge $(u,v) \in E$. In other words, if users $u$ and $v$ are friends, then there is an edge between nodes $u$ and $v$ (i.e., $(u,v) \in E$).
In software, a graph $G$ is represented by an adjacency list A
of size
$|V|$, the number of nodes in $G$. The $u^{th}$ entry of A
contains the
list of nodes $u$ is connected to. Therefore, A
is a list of lists. In this
lab assignment, you are required to read a file which describes a social network and form its adjacency list. Each line in the file contains
data of the form u v c d
. This means that nodes $u$ and $v$ are
connected to each other (i.e., for nodes $u,v \in V$ we have that $(u,v) \in
E$). Ignore c
and d
as they do not contain any relevant information.
Write a function called build_adjacency
that accepts
filename
, min_degree
, and num_nodes
as input
parameters and returns adjacency
as output. Start by declaring a
list (of empty lists) of size num_nodes
and then open the file
filename
which contains the social network. The social network data
is stored in out.facebook-wosn-links
which can be found on the
course webpage. When reading the file out.facebook-wosn-links
,
consider the first num_nodes
and ignore the rest. After forming the
adjacency list for the first num_nodes
nodes, ignore (delete) all the nodes
that have less than min_degree
friends. For example, assume node
u
has less than min_degree
neighbors, then set
adjacency[u] = []
and remove node u
from its neighbors' lists.
Finally, return the updated adjacency
list.
def build_adjacency(filename, min_degree, num_nodes):
adjacency = [[] for i in range(num_nodes)]
# your code goes here
return adjacency
You will need the following basic imports and function to draw graphs for you. You should know how to use draw_graph
, but you don't really need to know how it works.
%matplotlib inline
from pylab import *
import random as rnd
import networkx as nx
from __future__ import division
rcParams['figure.figsize'] = 12, 12 # that's default image size for this interactive session
def draw_graph(graph, labels=None, graph_layout='shell',
node_size=1600, node_color='blue', node_alpha=0.3,
node_text_size=12,
edge_color='blue', edge_alpha=0.3, edge_tickness=1,
edge_text_pos=0.3,
text_font='sans-serif'):
"""
Based on: https://www.udacity.com/wiki/creating-network-graphs-with-python
We describe a graph as a list enumerating all edges.
Ex: graph = [(1,2), (2,3)] represents a graph with 2 edges - (node1 - node2) and (node2 - node3)
"""
# create networkx graph
G=nx.Graph()
# add edges
for edge in graph:
G.add_edge(edge[0], edge[1])
# these are different layouts for the network you may try
# shell seems to work best
if graph_layout == 'spring':
graph_pos=nx.spring_layout(G)
elif graph_layout == 'spectral':
graph_pos=nx.spectral_layout(G)
elif graph_layout == 'random':
graph_pos=nx.random_layout(G)
else:
graph_pos=nx.shell_layout(G)
# draw graph
nx.draw_networkx_nodes(G,graph_pos,node_size=node_size,
alpha=node_alpha, node_color=node_color)
nx.draw_networkx_edges(G,graph_pos,width=edge_tickness,
alpha=edge_alpha,edge_color=edge_color)
nx.draw_networkx_labels(G, graph_pos,font_size=node_text_size,
font_family=text_font)
# show graph
plt.show()
# run this cell for a few graphs of your choice
graph = [(1,2),(2,3),(1,3)]
draw_graph(graph)
Write a function that accepts an adjancency list (called adjacency
) and outputs a list of tuples (called graph
).
def adjacency_to_graph(adjacency):
graph = []
# your code goes here
return graph
Tweak and run the code below to visualize a small social netowrk.
adjacency = build_adjacency('C:\Users\kairouz2.UOFI\Dropbox\Big Data\out.facebook-wosn-links',4,50)
graph = adjacency_to_graph(adjacency)
draw_graph(graph)
Susceptible-Infected Model. The susceptible-infected (SI) model states that the rumor starts at node $v^* \in G$ at $n=0$. We refer to $v^*$ as the rumor source. At iteration $n=1$, $v^*$ can forward the rumor (uniformly at random) to one and only one of its neighbors. At iteration $n>1$, if node $v$ has the rumor, it can pass it to a node $u$ if and only if there is an edge between them. This process is repeated in discrete time steps (iterations) until we have a total of $N$ nodes with the rumor. If a node $v$ has the rumor, we say that node $v$ is infected. If node $v$ passes the rumor to node $u$, we say that node $v$ infects node $u$.
Implementing the SI Model in Python.
Given a social network represented by adjacency
and a fixed number N
, use
the SI model to infect N
nodes starting from a randomly chosen rumor
source. In order to do so in Python, you can create a list (of empty lists)
called who_infected
of N
entries. This will store the
adjacency list of the infected sub-graph. You will also need to define a list
of size N
that will keep track of the indices of the infected nodes
in the original underlying graph. Call this list infected_nodes
.
Here is how you can fill up who_infected
and
infected_nodes
:
n = 0
, choose a source node v*
uniformly at random from {0, ..., num_nodes -1}
. If
adjacency[v*] = []
ignore the selected v*
and redraw
a new sample. Redo the aforementioned procedure (if needed) until you
find a v*
for which adjacency[v*]
is not empty. Set
infected_nodes[0]=v*
. n = 1
, choose (uniformly at random) one of
v*
's neighbors and infect it. Let's say the index of this
infected node is v_1
. Set
infected_nodes[1]=v_1
and update
who_infected
as follows:
who_infected[0]=[1]
and
who_infected[1]=[0]
. Notice that the node indices v*
and v_1
have been relabelled in the infected subgraph to become 0
and 1
, respectively.n > 1
, consider the set of all susceptible nodes and infect one of them uniformly at random. The set of susceptible nodes: (a) contains the set of uninfected nodes that are connected to infected nodes: v*
, v_1
, ..., v_{n-1}
(b) if an uninfected node is connected to $K$ infected nodes, then it must appear $K$ times in the list of susceptible nodes (each time for one of the infected nodes it's connected to). Let's say the index of the randomly chosen susceptible node is v_n
and the index of the node that infected it is v_k
. Set infected_nodes[n]=v_n
and update who_infected
as follows: append n
to who_infected[k]
and set who_infected[n]=[k]
.n = N-1
.Simple Example. To better understand the details of the SI model, let's consider the following simple example.
graph = [(0,1),(0,2),(0,3),(1,2),(1,5),(2,4)]
draw_graph(graph)
Let's assume that node 0
is the rumor source. So, infected_nodes[0] = 0
in this case. At n = 1
, the susceptible list includes nodes {1, 2, 3}
. Let's say you randomly select node 2
to be the second infected node. So, infected_nodes[1] = 2
, who_infected[0] = 1
, and who_infected[1] = 0
. At n = 2
, the susceptible list includes nodes {1, 3, 1, 4}
. Notice that node 1
appears twice since it can be infected by node 0
(hence the first occurrence) and by node 2
(hence the second occurrence). Let's say you randomly select the second occurrence of node 1
to be the third infected node. So, infected_nodes[2] = 1
, who_infected[1] = [0, 2]
, and who_infected[2] = 1
. Make sure to remove the first occurrence of node 1
because you do not want to reinfect and already infected node at later iterations.
def generate_source(adjacency):
# your code goes here
return source
def si_model_rumor_spreading(source, adjancency, N):
infected_nodes = [-1]*N;
who_infected = [[] for i in range(N)]
# adding the source node to the list of infected nodes
infected_nodes[0] = source
susceptible_nodes = adjacency[source]
susceptible_indices = [0]*len(susceptible_nodes)
for i in range(1,N):
# your code goes here
return who_infected, infected_nodes
# run this script to test the above functions
N = 10
num_nodes = len(adjacency)
source = generate_source(adjacency)
who_infected, infected_nodes = si_model_rumor_spreading(source, adjacency, N)
graph = []
for i in xrange(N):
for j in xrange(len(who_infected[i])):
graph.append((infected_nodes[i], infctn_pattern[who_infected[i][j]]))
draw_graph(graph, node_color = 'red')