University of Bristol
EFIMM0095 (Data Science for Economics)
Professor Vincent Han
Sample Exam
• The duration of this exam is two hours.
• There are in total six sections. All questions must be answered. Make sure you appropriately
allocation your time and do not spend too much time in one question.
• Good luck!
1. (15 points) Consider the following list of machine learning methods: the lasso, K-nearest neigh-
bors (KNN), linear discriminant analysis (LDA), trees, support vector machine (SVM), principal
components analysis (PCA), and clustering.
(a) Among these methods, list all the methods that are supervised learning.
(b) Among the supervised learning methods, list all the methods that are designed only to
conduct classification (but not regression).
(c) Among the supervised learning methods, list all the methods that are local methods. Discuss
what local methods mean and what their advantages are compared to global methods.
2. (25 points) Consider the following figure. The left panel depicts three different fitted lines/curves
for the data. The black curve depicts the true model. In the right panel, the downward-sloping
curve is the training MSE and the U-shaped curve is the test MSE. The color of the dot corresponds
to the color of the line/curve in the left panel.
1
0 20 40 60 80 100
2
4
6
8
10
12
X
Y
2 5 10 20
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
Flexibility
M
ea
n
Sq
ua
re
d
Er
ro
r
(a) Describle a single learning method that is supposed to be used to estimate all the lines/curves
in the left panel. Also state what determines the flexibility of the model in this learning
method and discuss how to increase/decrease the flexibility.
(b) In the right panel, state in one or two sentences why the test MSE starts to increase as
the flexibility of model increases, unlike the training MSE which constantly decreases in
flexibility.
(c) State what the horizontal dashed line indicates in the right panel. Discuss why the stated
quantity is an important benchmark in any machine learning metheds (in one or two sen-
tences).
(d) Given the answer in (b), describe the general goal of the cross-validation (CV).
(e) Describe the full algorithm of the leave-one-out CV (LOOCV). Discuss which part of the
algorithm makes this procedure computationally intensive (compared to, say, the 5-fold CV).
3. (15 points) Consider a categorical variable Y ∈ {1, ..., K} that indicates the label of each class.
Namely, there are K possible classes. Let X be the features that potentially contain information of
2
the class membership. Suppose we know the following conditional probabilities: For all k = 1, ..., K
and all possible values of x,
Pr[Y = k|X = x].
(a) Suppose K = 2. Then, you know the quantity Pr[Y = 1|X = x] (and trivially Pr[Y = 2|X =
x] = 1−Pr[Y = 1|X = x]) for all x. Given this quantity, construct a classifier that minimizes
the test error rate. That is, provide a decision rule (mathematically) that determines the
class of an individual with X = x0. [You don’t need to show that the test error rate is in
fact minimized with this classifier.]
(b) Motivated from the answer in (a), construct a classifier with general K.
(c) In practice, is the conditional probability above known or not? If yes, discuss how it is known.
If not, describe an example of learning methods that recovers it.
4. (20 points) Consider the following two different optimization problems that are associated with
two different learning methods:
min
β0,β1,...,βp
n∑
i=1
yi − β0 − p∑
j=1
βjxij
2 + λ p∑
j=1
β2j
and
min
β0,β1,...,βp
n∑
i=1
yi − β0 − p∑
j=1
βjxij
2 + λ p∑
j=1
|βj| .
(a) Discuss the key similarity of the two methods.
(b) Discuss the key difference of the two methods.
(c) Describe whether the bias of your prediction will increase or decrease when you increase the
value of λ. Describe whether the variance of your prediction will increase or decrease when
you increase the value of λ.
3
(d) Motivated from your answers in (c), describe in one or two sentences the optimal way of
choosing λ.
5. (15 points) Consider tree-based methods.
(a) In growing a tree, we typically use a top-down, greedy approach called the recursive binary
splitting. This approach chooses two things in each step that minimize a certain criterion
function that is related to the MSE. What are the two choices that are made in each step?
(b) Describe in one or two sentences the motivation of using bagging, random forest or boosting
instead of a single-tree method of (a).
(c) Even though the motivation is the same, bagging, random forest and boosting implement
different learning algorithms. State in one sentence how random forest is different from
bagging. Then state in one sentence why this feature of the random forest will improve the
performance, namely, the prediction accuracy given a test set.
6. (10 points) The support vector machine (SVM) improves over the support vector classifier, which
in turn improves over the maximal margin classifier. Suppose there are two possible classes. The
maximal margin classifier finds a separating hyperplane that separate the observations into the
two classes, by maximizing the distance between the hyperplane and the closest observations on
each side of the hyperplane. The distance is called a margin.
(a) Given this description, state in one or two sentences how the support vector classifier improves
over the maximal margin classifier.
(b) Then state in one or two sentences how the SVM improves over the support vector classifier.
4