CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
CSCC46 Assignment 2
Instructions
• These problems require thought, but do not require long answers. Please be as concise as possible.
• This assignment is due by 2pm on Thursday, October 29. You may use up to two flex days to submit
the assignment late, meaning the hard deadline after which no late assignments will be accepted is
2pm Saturday, October 31.
• Your assignment must be typed; handwritten assignments will not be marked. You may use any word-
processing software you like (many academics use LaTeX). Figures can either be drawn electronically,
or you may hand-draw, take pictures, and include them as images in your submission. Whatever
word-processing you choose, please submit in PDF format.
• All assignments must be submitted electronically on MarkUs (see the course webpage for the MarkUs
link). Well before the due date, try submitting with MarkUs. You can submit an empty file as a
placeholder, and then submit a new version of the file later by using the “Replace” column. Check
Useful Mathematical Facts
This assignment is more rigorous than the previous one. Here are some mathematical facts that might be
useful.
• The Union Bound. This incredibly simple and intuitive fact comes in handy more often than you
might expect. It says that for any finite set of events, the probability that at least one of them happens
is no greater than the sum of the probabilities of the individual events:
P
(⋃
i
Ai
)

i
P (Ai)
The union of a bunch of events simply means “the event that at least one of these events happens”,
and obviously cannot exceed the sum of the individual probabilities of the events, because that is
the probability that they all happen in the extreme case that they are all independent. It might be
smaller if some of the events are overlapping. For example, say that A1 is the event that the Blue
Jays win, and P (A1) = 0.4, and say that P2 is the event that it rains, and P (A2) = 0.25. The
probability that at least one of A1 and A2 happens (either it rains or they win) cannot be greater than
P (A1) +P (A2) = 0.4 + 0.25 = 0.65. It could be smaller though (for example, if it only rains when the
Blue Jays win).
• A Useful Limit. Oftentimes you want to reason about the probability that a bunch of events don’t
happen. If the probability of one of the events happening can be expressed as 1/x, and there are x
independent events, then you can use:
lim
x→∞
(
1− 1
x
)x
=
1
e
1
CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
• Change of Base Formula. To change the base of a logarithm from α to β, use:
logα x =
logβ x
logβ α
Question 1: Signed Networks [25 pts]
1.1 [10 pts] In this question, we will analyze a simple generative model of signed networks to see if balance
occurs randomly. Consider the following simple model for constructing random signed networks, which
we’ll call the G+ model. Start with a complete graph on n nodes. For each edge e mark its sign as
positive with probability p (and thus negative with probability 1− p). All edges are undirected.
Let GB denote the event that a graph G is balanced. In this question, we’ll show that P (GB) → 0 as
n→∞ for graphs generated according to the G+ model. Assume that p = 1/8.
(a) [2 pts] Let T be a maximum set of disjoint-edge triangles in G. A “disjoint-edge” set of triangles
is one in which every edge is in exactly one triangle. Give a simple lower bound for |T |, the number
of triangles that don’t share any edges in G (the bound should be an increasing function of n).
(b) [2 pts] For any triangle in G, what is the probability that it is balanced?
(c) [4 pts] Using the simple lower bound from part (a), give an upper bound on the probability that
all of the triangles in T are balanced. Show that this probability approaches 0 as n→∞.
(d) [2 pts] Explain why the last part implies that P (GB)→ 0 as n→∞.
1.2 [5 pts] If balanced signed networks don’t show up by chance, as we showed in the previous question,
how do they arise? One class of mechanisms that researchers have proposed and studied are dynamic
processes, in which signed networks can evolve over time to become more balanced. The following
describes a very simple example of such a dynamic process:
I Pick a triad at random.
II If it’s balanced, do nothing.
III Otherwise, arbitrarily choose one of the edges and flip its sign so that the triad becomes balanced.
Consider the following claim: in this process, the number of balanced triads can never decrease. Is this
true? If so, give a proof, otherwise give a counterexample.
1.3 (a) [5 pts] One way signed networks can evolve over time is if new nodes join the network and create
new signed edges to nodes already in the network. Consider the simple network shown below. Is it
possible to add a node D such that it forms signed edges with all existing nodes (A, B, and C), but
A
B C
+
+

(b) [5 pts] Using your answer to the previous sub-problem, consider the following question. Take any
complete signed network, on any number of nodes, that is unbalanced. When (if ever) is it possible
for a new node X able to join the network and form edges to all existing nodes in such a way that it
does not become involved in any unbalanced triangles? If it’s possible, give an example. If it’s not,
give an argument explaining why.
2
CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
Question 2: Decentralized Search [50 pts]
In class, we saw a decentralized search algorithm based on “geography” that seeks a path between two nodes
on a grid by successively taking the edge towards the node closest to the destination.
Here we will examine an example of a decentralized search algorithm on a network where the edges are cre-
ated according to an underlying hierarchical tree structure. In particular, the nodes of our network will be
defined as the leaves of a tree, and the edge probabilities between two nodes will depend on their proximity
in this underlying tree structure. The tree may, for instance, be interpreted as representing the hierarchical
organization of a university where one is more likely to have friends inside the same department, a bit less
likely in the same school, and the least likely across schools.
Let us organize students at U of T into a tree hierarchy, where the root is University of Toronto and the
second level contains the different schools (engineering, arts and sciences, etc.). The third level represents
the departments and the final level (i.e., the leaves) are the U of T students. Fatima, a student from the
computer science department, wants to hang out with Max, who is in sociology. If Fatima does not know
Max, she could ask a friend in the sociology department to introduce them. If Fatima does not know anybody
in the sociology department, she may seek a friend in the U of T humanities school instead. In general, she
will try to find a friend who is “close” to Max in the tree.
One important thing to keep in mind: In this problem, the network nodes are only the leaf nodes in
the tree. Other nodes in the tree are virtual and only there to determine the edge creation probabilities
between the nodes of the network. So there are two networks: one is the observed network (i.e., “edges
between students”) and the other is the underlying tree structure that is used to generate the edges in the
observed network (“the hierarchy of U of T”).
Figure 1: Illustration of the graph in Question 2. Black nodes and edges are used to illustrate the hierarchy
and structure, but are not part of our network. Red nodes (leaf nodes) and red edges are the ones in our
network. The lowest common ancestor of s and t is the root of the tree. The decentralized search proceeds
as follows. Denote the starting node by s and the destination by t. At each step, the algorithm looks at
the neighbors of the current node s and moves to the one “closest” to t, that is, the algorithm moves to the
node with the lowest common ancestor with t. In this graph, from s we move to u.
2.1 This part covers some basic facts of the setting of the hierarchical model and defines the notion of
tree-based “distance” between the nodes.
Consider a complete, perfectly balanced b-ary tree T (each non-leaf node has b children and b ≥ 2), and
a network whose nodes are the leaves of T (the red nodes in Figure 1). Let the number of network nodes
(equivalently, the number of leaves in the tree) be N and let h(T ) denote the height of T (see Figure 1
for an example). Recall that a tree with one node has a height of zero.
3
CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
Note: When answering questions in this section, feel free to cite well-known properties of trees (e.g.,
number of nodes in a complete binary tree of a certain height), but please provide evidence of reasoning
as well. In other words, we don’t expect you to prove basic tree properties by induction; just provide
some sound reasoning.
(a) (2 points) Write h(T ) in terms of N .
(b) (3 points) Next we want to define the notion of “tree distance.” The intuition we want to capture
is that students that share the same department are closer than, for example, students who only
share the same school. For instance, in the tree in Figure 1 nodes u and t are “closer” than nodes u
and s. We formalize the notion of “distance” as follows.
Given two network nodes (leaf nodes) v and w, let L(v, w) denote the subtree of T rooted at the
lowest common ancestor of v and w, and h(v, w) denote its height (that is, h(L(v, w))). In Figure 1,
L(u, t) is the tree in the circle and h(u, t) = 2. Note that we can think of h(v, w) as a “distance”
between nodes v and w.
For a given node v, what is the maximum possible value of h(v, w)?
(c) (5 points) Given a value d and a network node v, show that there are bd − bd−1 nodes satisfying
h(v, w) = d.
2.2 This part helps you design a decentralized search algorithm in the network.
We will generate a random network on the leaf nodes in a way that models the observation that a node
is more likely to know “close” nodes than “distant” nodes according to our university organizational
hierarchy captured by the tree T . For a node v, we define a probability distribution of node v creating
an edge to any other node w:
pv(w) =
1
Z
b−h(v,w)
where Z =

w 6=v b
−h(v,w) is a normalizing constant. By symmetry, all nodes v have the same normalizing
constant. Next, we set some parameter k and ensure that every node v has exactly k outgoing edges.
We do this with the following procedure. For each node v, we repeatedly sample a random node w
according to pv and create edge (v, w) in the network. We continue this until v has exactly k neighbors.
Equivalently, after we add an edge from v to w, we can set pv(w) to 0 and renormalize with a new Z to
ensure that

w p(w) = 1. This results in a k-regular directed network.
(a) (7 points) Show that Z ≤ logbN . (Hint: use the results in parts 2.1(a) and 2.1(b), and group
nodes by their distance. For each distance d, you know exactly how many nodes are at distance d
and you have a lower bound for the probability of linking to them).
(b) (8 points) For a node v and the target t, let T ′ be the subtree of L(v, t) satisfying:
• T ′ is of height h(v, t)− 1,
• T ′ contains t,
• T ′ does not contain v.
For instance, in Figure 1, T ′ of L(s, t) is the tree in the circle. Let us consider an edge e from v to a
random node u sampled from pv. We say that e points to T
′ if u is one of the leaf nodes of T ′. Show
that the probability of e pointing to T ′ is no less than 1b logbN . (Hint: Note that all of the nodes in
T ′ are the same distance away from v, by our notion of tree distance that we defined in 2.1(b). How
many nodes are in T ′?)
2.3 (15 points) Let the out-degree k for each node be c · (logbN)2, where c and b are constants. Show that
when N grows very large, the probability that v has no edge pointing to T ′ is asymptotically no more
than N−θ, where θ is a positive constant which you need to compute. (Hints: Use the result in part
2.2(b) and recall that each of the k edges is independently created. Also, use limx→∞(1 − 1x )x = 1e .)
4
CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
Argue why the above result indicates that for any node v, we can, with high probability, find an edge to
a (leaf) node u satisfying h(u, t) < h(v, t).
2.4 (10 points) Show that starting from any (leaf) node s, within O(logbN) steps, we can reach any (leaf)
node t. You do not need to prove it in a strict probabilistic argument. You can just assume that for any
(leaf) node v, you can always get to a (leaf) node u satisfying h(u, t) < h(v, t) and argue why you can
reach t in O(logbN) steps.
Power Laws [25 pts]
3.1 [10 pts] The main summary statistics of distributions are called moments. For example, the mean
is also known as the first moment and the variance is also known as the second moment. In Lecture
6, we calculated the mean of a power law distribution. In this question, calculate the skewness of a
power law distribution, which is also known as the third moment. That is, calculate
∫∞
xmin
x3p(x)dx,
where
p(x) =
α− 1
xmin
( x
xmin
)−α
3.2 [15 pts] As we learned in class, the power law can be a great fit for the heavy-tailed data we en-
counter in real-world networks. But one problem we have to overcome is: what parameters are the
best fit? The power law is characterized by the minimum x-value xmin and the slope α. In this
question, we’ll learn how to estimate α from data.
Say you have a dataset {xi | i ∈ {1, . . . , n}} of n numbers, and you suspect these numbers come from
a power law distribution and want to find the best fit α. The strategy we’re going to use to estimate
α is called the maximum likelihood approach. We’re going to compute the log-likelihood of the
data given a power law model, then calculate which value of α maximizes this log-likelihood (we
take the log of the likelihood to simplify the math).
(a) [5 pts] Let L(α) = ln (∏ni p(xi)) be the log-likelihood of the data as a function of α. Plug in the
definition of a power law for p(xi) (shown above in Question 1.1) and simplify the expression
for L(α).
(b) [10 pts] Now take the derivative of L(α) with respect to α and set it to zero:
dL(α)

= 0
This is now a simple single-variable calculus problem. Solve this equation to get the value of α
that maximizes the log-likelihood of the data. That is the value of α that fits the data best.
5 