Python代写-PROJECT 2 8
时间:2022-04-21
02450 PROJECT 2 8 March, 2022
Project description for report 2
Objective: The objective of this second report is to apply the methods you have
learned in the second section of the course on ”Supervised learning: Classification
and regression” in order to solve both a relevant classification and regression problem
for your data.
Material: You can use the 02450Toolbox on Inside to see how the various methods
learned in the course are used in Matlab, R or Python. In particular, you should
review exercise 5 to 8 in order to see how the various tasks can be carried out.
Mandatory section
In order to have your report evaluated, it must contain the following two items:
ˆ According to the DTU regulations, each students contribution to the report
must be clearly specified. Therefore, for each section, specify which student
was responsible for it (use a list or table). A report must contain this
documentation to be accepted. The responsibility assignment must be
individualized1
ˆ Solutions, or attempted solutions, for at least four of the exam problems found
at the end of this document2 . The solutions do not have to be long (a couple
of lines, perhaps a calculation) but must show the gist of your reasoning so as
to verify you have worked independently on the problem. We suggest they are
given in an itemized format:
1. Option A/B/C/D: To see this ...
1For reports made by 3 students: Each section must have a student who is 40% or more re-
sponsible. For reports made by 2 students: Each section must have a student who is 60% or more
responsible.
2We ask you to do this because it has been our experience some students are unfamiliar with
the written exam format until days before the exam, and we think this is the best way to ensure
the requirements of the written exam are made clear early on. We don’t evaluate your answers for
correctness because that aspect of the course will be tested at the exam and would be redundant
here.
02450 PROJECT 2 8 March, 2022
2. Option A/B/C/D: We solve this by using..
Don’t know is obviously not allowed, but you can take inspiration from the
homework problems (and solutions given at the end of the notes). The purpose
is to demonstrate that you have worked on the exam problems but not to test
for correctness, and you can therefore hand in solutions which describes your
best attempt at solving the problem (but you know are wrong). Keep in mind
the solutions (fraction correct etc.) will not affect your evaluation, but rather
whether the report is evaluated at all.
Your report cannot be evaluated unless it contains these items.
Handin checklist
ˆ Make sure the mandatory section is included
ˆ Make sure the report clearly display the names and study numbers of all
group members. Make sure study numbers are correct.
ˆ Your handin should consist of exactly two files: A .pdf file containing the
report, and a .zip file containing the code you have used (extensions: .py, .R
or .m; do not upload your data). The reports are not evaluated based on the
quality of the code (comments, etc.), however we ask the code is included to
avoid any potential issues of illegal collaboration between groups. Please do
not compress or convert these files.
ˆ Reports are evaluated based on how well they address the questions below.
Therefore, to get the best evaluation, address all questions
ˆ Use the group handin feature. Do not upload separate reports for each
team member as this will lead to duplicate work and unhappy in-
structors
ˆ Deadline for handin is no later than 19 April at 13:00. Late handins
will not be accepted under normal circumstances
Description
Project report 2 should naturally follow project report 1 on ”Data: Feature extrac-
tion, and visualization” and cover what you have learned in the lectures and exercises
02450 PROJECT 2 8 March, 2022
of week 5 to 8 on ”Supervised learning: Classification and regression”. The report
should therefore include two sections. A section on regression and a section on classi-
fication. The report will be evaluated based on how it addresses each of the questions
asked below and an overall assessment of the report quality.
Regression, part a: In this section, you are to solve a relevant regression problem
for your data and statistically evaluate the result. We will begin by examining the
most elementary model, namely linear regression.
1. Explain what variable is predicted based on which other variables and what
you hope to accomplish by the regression. Mention your feature transformation
choices such as one-of-K coding. Since we will use regularization momentarily,
apply a feature transformation to your data matrix X such that each column
has mean 0 and standard deviation 13.
2. Introduce a regularization parameter λ as discussed in chapter 14 of the lecture
notes, and estimate the generalization error for different values of λ. Specifi-
cally, choose a reasonable range of values of λ (ideally one where the general-
ization error first drop and then increases), and for each value use K = 10 fold
cross-validation (algorithm 5) to estimate the generalization error.
Include a figure of the estimated generalization error as a function of λ in the
report and briefly discuss the result.
3. Explain how a new data observation is predicted according to the linear model
with the lowest generalization error as estimated in the previous question. I.e.,
what are the effects of the selected attributes in terms of determining the
predicted class. Does the result make sense?
Regression, part b: In this section, we will compare three models: the regularized
linear regression model from the previous section, an artificial neural network (ANN)
and a baseline. We are interested in two questions: Is one model better than the
other? Is either model better than a trivial baseline?. We will attempt to answer
these questions with two-level cross-validation.
1. Implement two-level cross-validation (see algorithm 6 of the lecture notes). We
will use 2-level cross-validation to compare the models with K1 = K2 = 10
3We treat feature transformations and linear regression in a very condensed manner in this
course. Note for real-life applications, it may be a good idea to consider interaction terms and
the last category in a one-of-K coding is redundant (you can perhaps convince yourself why). We
consider this out of the scope for this report
02450 PROJECT 2 8 March, 2022
Outer fold ANN Linear regression baseline
i h∗i E
test
i λ

i E
test
i E
test
i
1 3 10.8 0.01 12.8 15.3
2 4 10.1 0.01 12.4 15.1
...
...
...
...
...
...
10 3 10.9 0.05 12.1 15.9
Table 1: Two-level cross-validation table used to compare the three models
folds4. As a baseline model, we will apply a linear regression model with no
features, i.e. it computes the mean of y on the training data, and use this value
to predict y on the test data.
Make sure you can fit an ANN model to the data. As complexity-controlling
parameter for the ANN, we will use the number of hidden units5 h. Based on
a few test-runs, select a reasonable range of values for h (which should include
h = 1), and describe the range of values you will use for h and λ.
2. Produce a table akin to Table 1 using two-level cross-validation (algorithm 6
in the lecture notes). The table shows, for each of the K1 = 10 folds i, the
optimal value of the number of hidden units and regularization strength (h∗i
and λ∗i respectively) as found after each inner loop, as well as the estimated
generalization errors Etesti by evaluating on Dtesti . It also includes the baseline
test error, also evaluated on Dtesti . Importantly, you must re-use the train/test
splits Dpari ,Dtesti for all three methods to allow statistical comparison (see next
section).
Note the error measure we use is the squared loss per observation, i.e. we divide
by the number of observation in the test dataset:
E =
1
N test
Ntest∑
i=1
(yi − yˆi)2
Include a table similar to Table 1 in your report and briefly discuss what it tells
you at a glance. Do you find the same value of λ∗ as in the previous section?
4If this is too time-consuming, use K1 = K2 = 5
5Note there are many things we could potentially tweak or select, such as regularization. If you
wish to select another parameter to tweak feel free to do so.
02450 PROJECT 2 8 March, 2022
3. Statistically evaluate if there is a significant performance difference between the
fitted ANN, linear regression model and baseline using the methods described
in chapter 11. These comparisons will be made pairwise (ANN vs. linear
regression; ANN vs. baseline; linear regression vs. baseline). We will allow
some freedom in what test to choose. Therefore, choose either:
setup I (section 11.3): Use the paired t-test described in Box 11.3.4
setup II (section 11.4): Use the method described in Box 11.4.1)
Include p-values and confidence intervals for the three pairwise tests in your
report and conclude on the results: Is one model better than the other? Are
the two models better than the baseline? Are some of the models identical?
What recommendations would you make based on what you’ve learned?
Classification: In this part of the report you are to solve a relevant classification
problem for your data and statistically evaluate your result. The tasks will closely
mirror what you just did in the last section. The three methods we will compare is a
baseline, logistic regression, and one of the other four methods from below (referred
to as method 2 ).
Logistic regression for classification. Once more, we can use a regularization pa-
rameter λ ≥ 0 to control complexity
ANN Artificial neural networks for classification. Same complexity-controlling pa-
rameter as in the previous exercise
CT Classification trees. Same complexity-controlling parameter as for regression
trees
KNN k-nearest neighbor classification, complexity controlling parameter k = 1, 2 . . .
NB Na¨ıve Bayes. As complexity-controlling parameter, we suggest the term b ≥ 0
from section 11.2.1 of the lecture notes to estimate6 p(x = 1) = n
++b
n++n−+2b
1. Explain which classification problem you have chosen to solve. Is it a multi-
class or binary classification problem?
6In Python, use the alpha parameter in sklearn.naive bayes and in R, use the laplacian
parameter to naiveBayes. We do not recommend NB for Matlab users, as the implementation is
somewhat lacking.
02450 PROJECT 2 8 March, 2022
Outer fold Method 2 Logistic regression baseline
i x∗i E
test
i λ

i E
test
i E
test
i
1 3 10.8 0.01 12.8 15.3
2 4 10.1 0.01 12.4 15.1
...
...
...
...
...
...
10 3 10.9 0.05 12.1 15.9
Table 2: Two-level cross-validation table used to compare the three models in the
classification problem.
2. We will compare logistic regression7, method 2 and a baseline. For logistic
regression, we will once more use λ as a complexity-controlling parameter, and
for method 2 a relevant complexity controlling parameter and range of values.
We recommend this choice is made based on a trial run, which you do not need
to report. Describe which parameter you have chosen and the possible values
of the parameters you will examine.
The baseline will be a model which compute the largest class on the training
data, and predict everything in the test-data as belonging to that class (corre-
sponding to the optimal prediction by a logistic regression model with a bias
term and no features).
3. Again use two-level cross-validation to create a table similar to Table 2, but
now comparing the logistic regression, method 2, and baseline. The table should
once more include the selected parameters, and as an error measure we will use
the error rate:
E =
{Number of misclassified observations}
N test
Once more, make sure to re-use the outer validation splits to admit statistical
evaluation. Briefly discuss the result.
4. Perform a statistical evaluation of your three models similar to the previous
section. That is, compare the three models pairwise. We will once more allow
some freedom in what test to choose. Therefore, choose either:
setup I (section 11.3): Use McNemera’s test described in Box 11.3.2)
7in case of a multi-class problem, substitute logistic regression for multinomial regression
setup II (section 11.4): Use the method described in Box 11.4.1)
Include p-values and confidence intervals for the three pairwise tests in your
report and conclude on the results: Is one model better than the other? Are
the two models better than the baseline? Are some of the models identical?
What recommendations would you make based on what you’ve learned?
5. Train a logistic regression model using a suitable value of λ (see previous ex-
ercise). Explain how the logistic regression model make a prediction. Are the
same features deemed relevant as for the regression part of the report?
Discussion:
1. Include a discussion of what you have learned in the regression and classification
part of the report.
2. If your data has been analyzed previously (which will be the case in nearly
all instances), find a study which uses it for classification, regression or both.
Discuss how your results relate to those obtained in the study. If your dataset
has not been published before, or the articles are irrelevant/unobtainable, this
question may be omitted but make sure you justify this is the case.
The report should be 5-10 pages long (and no longer!) including figures and ta-
bles and give a precise and coherent account of the results of the regression and
classification methods applied to your data.
Transferring/reusing reports from previous semesters
If you are retaking the course, you are allowed to reuse your previous report. You can
either have the report transferred in it’s entirety, or re-work sections of the report
and have it evaluated anew.
To have a report transferred, do absolutely nothing. Reports from previous semesters
are automatically transferred. Therefore, please do not upload old reports to Inside
as this will lead to duplicate work. As a safeguard, we will contact all students who
are missing reports shortly after the exam.
If you wish to redo parts of a report you have already handed in as part of a group in a
previous semester, then to avoid any issues about plagiarism please keep attribution
to the original group members for those sections you choose not to redo.
7
1 Exam problems for the project
8
Problems
Question 1. Spring 2019 question 13:
Figure 1: ROC curve for a neural network clas-
sifier, where the predictions and true class la-
bels are one of the options in fig. 2.
Figure 2: Four candidate predictions for the
ROC curve in fig. 1. The observations are
plotted horizontally, such that the position on
the x-axis indicate the predicted value yˆi, and
the marker/color indicate the class member-
ship, such that the black circles indicate the ob-
servation belongs to class yi = 0 and red crosses
to yi = 1.
A neural network classifier is trained to distin-
guish between two classes y ∈ {0, 1} and pro-
duce class-probability yˆ and the reciever opera-
tor characteristic (ROC) curve of the network
when evaluated on a test set with N = 8 obser-
vations is shown in fig. 2. Suppose we plot the
predictions on the N = 8 test observations by
their yˆ value along the x-axis and indicate the
class labels by either a black circle (class y = 0)
or red cross (y = 1), which one of the subplots
in fig. 2 then corresponds to the ROC curve in
fig. 1?
A Prediction A
B Prediction B
C Prediction C
D Prediction D
E Don’t know.
Question 2. Spring 2019 question 15: Sup-
pose we wish to build a classification tree based
on Hunt’s algorithm where the goal is to pre-
dict Congestion level which can belong to four
classes, y = 1, y = 2, y = 3, y = 4. We con-
sider binary splits based on the value of x7, such
that observations where x7 = z are assigned to
the left branch and those where x7 ̸= z are as-
signed the right branch. In table 3 we have
indicated the number of observations in each of
the four classes for the different values x7 take
in the dataset. Suppose we use the classifica-
tion error impurity measure, which one of the
following statements is true?
x7 = 0 x7 = 1 x7 = 2
y = 1 33 4 0
y = 2 28 2 1
y = 3 30 3 0
y = 4 29 5 0
Table 3: Proposed split of the Urban Traffic
dataset based on the attribute x7. We consider
a two-way split where for each interval we count
how many observations belonging to that inter-
val has the given class label.
9
A The impurity gain of the split x7 = 2 is
∆ ≈ 0.0195
B The impurity gain of the split x7 = 2 is
∆ ≈ 0.0178
C The impurity gain of the split x7 = 2 is
∆ ≈ 0.0074
D The impurity gain of the split x7 = 2 is
∆ ≈ 0.0212
E Don’t know.
Question 3. Spring 2019 question 18: We
will consider an artificial neural network (ANN)
trained on the Urban Traffic dataset described
in table 4 to predict the class label y based on
attributes x1, . . . , x7. The neural network has a
single hidden layer containing nh = 10 units,
and will use the softmax activation function
(specifically, we will use the over-parameterized
softmax function described in section 14.3.2
(Neural networks for multi-class classification)
of the lecture notes) to predict the class label y
since it is a multi-class problem. For the hidden
layer we will use a sigmoid non-linearity. How
many parameters has to be trained to fit the
neural network?
No. Attribute description Abbrev.
x1 30-minute interval (coded) Time of day
x2 Number of broken trucks Broken Truck
x3 Number of accident victims Accident victim
x4 Number of immobile busses Immobilized bus
x5 Number of trolleybus network defects Defects
x6 Number of broken traffic lights Traffic lights
x7 Number of run over accidents Running over
y Level of congestion/slowdown (low to high) Congestion level
Table 4: Description of the features of the Ur-
ban Traffic dataset used in this exam. The
dataset describes urban traffic behaviour of the
city of Sao Paulo in Brazil. Each observation
corresponds to a 30-minute interval between
7:00 and 20:30, indicated by the integer x1, such
that x1 = 1 corresponds to 7:00-7:30 and so on
up to x1 = 27 that corresponds to 20:00-20:30.
The other attributes x2, . . . , x7 corresponds to a
number of occurences of the given type in that
30-minute interval. We will consider the pri-
mary goal to be classification, namely to pre-
dict y which is the level of congestion of the
bus network in the given interval. The dataset
used here consists of N = 135 observations and
the attribute y is discrete taking values y = 1
(corresponding to no congestion), y = 2 (cor-
responding to a light congestion), y = 3 (cor-
responding to an intermediate congestion), and
y = 4 (corresponding to a heavy congestion).
A Network contains 124 parameters
B Network contains 280 parameters
C Network contains 110 parameters
D Network contains 88 parameters
E Don’t know.
Question 4. Spring 2019 question 20:
10
Figure 3: Structure of decision tree. The goal
is to determine the splitting rules.
Figure 4: Classification boundary.
We will consider the Urban Traffic dataset pro-
jected onto the first two principal directions.
Suppose we train a decision tree to predict
which of the four classes an observation belongs
to. Since the attributes are continuous, we will
consider binary splits of the form bi ≥ z for dif-
ferent values of i and z, where b1, b2 refer to the
coordinates of the observations when projected
onto principal directions. Suppose the trained
decision tree has the form shown in fig. 3, and
that according to the tree the predicted label
assignment for the N = 135 observations are
as given in fig. 4, what is then the correct rule
assignment to the nodes in the decision tree?
A A: b1 ≥ −0.16, B: b2 ≥ 0.03, C:
b2 ≥ 0.01, D: b1 ≥ −0.76
B A: b1 ≥ −0.76, B: b1 ≥ −0.16, C:
b2 ≥ 0.03, D: b2 ≥ 0.01
C A: b2 ≥ 0.03, B: b1 ≥ −0.76, C:
b2 ≥ 0.01, D: b1 ≥ −0.16
D A: b1 ≥ −0.76, B: b2 ≥ 0.03, C:
b1 ≥ −0.16, D: b2 ≥ 0.01
E Don’t know.
Question 5. Spring 2019 question 22:
ANN Log.reg.
n∗h E
test
1 λ
∗ Etest2
Outer fold 1 1 0.385 0.01 0.615
Outer fold 2 1 0.357 0.01 0.286
Outer fold 3 1 0.429 0.01 0.357
Outer fold 4 1 0.571 0.06 0.714
Outer fold 5 1 0.538 0.32 0.538
Table 5: Result of applying two-level cross-
validation to a neural network model and a
logistic regression model. The table contains
the optimally selected parameters from each
outer fold (n∗h, hidden units and λ
∗, regulariza-
tion strength) and the corresponding test errors
Etest1 and E
test
2 when the models are evaluated
on the current outer split.
Suppose we wish to compare a neural net-
work model and a regularized logistic regres-
sion model on the Urban Traffic dataset. For
the neural network, we wish to find the opti-
mal number of hidden neurons nh, and for the
regression model the optimal value of λ. We
therefore opt for a two-level cross-validation ap-
proach where for each outer fold, we determine
the optimal number of hidden units (or regular-
ization strength) using an inner cross-validation
loop with K2 = 4 folds. The tested values are:
λ : {0.01, 0.06, 0.32, 1.78, 10}
nh : {1, 2, 3, 4, 5}.
11
Then, given this optimal number of hidden
units n∗h or regularization strength λ
∗, the
model is trained and evaluated on the current
outer split. This produces table 5 which shows
the optimal number of hidden units/lambda as
well as the (outer) test classification errors Etest1
(neural network model) and Etest2 (logistic re-
gression model). Note these errors are aver-
aged over the number of observations in the the
(outer) test splits. Suppose the time taken to
train/test a single neural network model in mil-
liseconds is
training time: 20 and testing time: 5
and the time taken to train/test a single logistic
regression model is
training time: 8 and testing time: 1,
what is approximately the time taken to com-
pose the table?
A 6800.0 ms
B 13600.0 ms
C 3570.0 ms
D 13940.0 ms
E Don’t know.
Question 6. Spring 2019 question 26:
Consider again the Urban Traffic dataset. We
consider a multinomial regression model ap-
plied to the dataset projected onto the first two
principal directions, giving the two coordinates
b1 and b2 for each observation. Multinomial re-
gression then computes the per-class probabil-
ity by first computing the 3 numbers:
yˆk =
 1b1
b2
⊤wk, for k = 1, . . . , 3
and then subsequently use the softmax trans-
formation in the form:
P (y = k|yˆ) =

eyˆk
1+
∑3
k′=1 e
yˆk′ . if k ≤ 3
1
1+
∑3
k′=1 e
yˆk′ if k = 4
to compute the per-class probabilities. Suppose
the weights are given as:
w1 =
 1.2−2.1
3.2
 ,w2 =
 1.2−1.7
2.9
 ,w3 =
 1.3−1.1
2.2
 .
Which of the following observations will be as-
signed to class y = 4?
A Observation b =
[−1.4
2.6
]
B Observation b =
[−0.6
−1.6
]
C Observation b =
[
2.1
5.0
]
D Observation b =
[
0.7
3.8
]
E Don’t know.
12
essay、essay代写