Python代写-QBUS1040-Assignment 5
时间:2022-05-26
QBUS1040 Assignment 5
Semester 1, 2022
Out: 12th May 2022
Due: 27th May 2022 at 11:59pm
Instructions
• This assignment consists of six problems, some involving multiple parts. Some parts require a written
response and others involve coding. The parts that require a written response are described in this
document, while the coding questions are described in the associated Jupyter notebook (.ipynb file).
• You should submit a PDF to GradeScope for the written questions and match the page number with
the questions that you answered. You can find the detailed instructions on how to scan and submit
your assignments through GradeScope on Canvas. If you fail to match the page to the corresponding
question, the marker will not be able to view your response and thus you will be awarded 0 marks for the
question.
• You may not use notation, concepts or material from other classes (say, a linear algebra class you may
have taken) in your solutions. All the problems can be done using only the material from this class, and
we will deduct points from solutions that refer to outside material.
Question: 1 2 3 4 5 6 Total
Points: 10 15 10 20 20 20 95
i
QBUS1040 Assignment 5 Due 27th May 2022
1. (10 points) Algorithm 12.1 in the textbook uses the QR factorization to compute the least squares approx-
imate solution xˆ = A†b, where the m× n matrix A has linearly independent columns. It has a complexity
of 2mn2 flops. In this exercise we consider an alternative method: First, form the Gram matrix G = ATA
and the vector h = AT b, then compute xˆ = G−1h (using algorithm 11.2 in the textbook). What is the
complexity of this method? Compare it to algorithm 12.1.
2. In this question, we examine computing k-fold cross-validation on a least squares problem ∥Ax− b∥2, where
A is a N×p matrix and b is a N -vector. The least squares problem can arise, for example from a data-fitting
model with N data points and p basis functions, and we wish to use k-fold cross validation to assess our
choice of basis functions.
We assume for convenience that N can be split into k even-sized folds of size N/k (so k divides N).
Let A1, . . . , Ak denote the (N/k) × p blocks of data corresponding to each fold, and b1, . . . , bp denote the
corresponding right hand sides for each fold. We also assume N is much larger than p, i.e., N ≫ p, and
that columns of A remain linearly independent even when we remove one block Ai.
(a) (5 points) Analyse the complexity of the following na¨ıve method of performing k-fold cross-validation:
for each fold i = 1, . . . , k
• Construct A˜i to be the matrix A but with rows corresponding to Ai removed. (Note that A˜i has
size ((k − 1)N/k)× p.)
• Construct b˜i to be the right-hand side vector b but with entries corresponding to bi removed. (Note
that b˜i is a ((k − 1)N/k)-vector.)
• Compute θi = A˜

i b˜i.
(b) (5 points) Argue that for any fold i = 1, . . . , k,
A˜Ti A˜i = A
TA−ATi Ai
A˜Ti b˜i = A
T b−ATi bi.
It may help to write A and b in block form:
A =
A1...
Ak
 , b =
b1...
bk
 .
Then think about what the difference is between A˜i, b˜i and A, b.
(c) (5 points) Based on the observation in part (b), we consider an alternative method to perform k-fold
cross validation:
• Compute G = ATA, h = AT b.
• For each fold i = 1, . . . , k, compute
θi = (A˜
T
i A˜i)
−1A˜Ti b˜i = (G−ATi Ai)−1(h−ATi bi).
Analyse the complexity of this method and compare it to the na¨ıve method of part (a). When is this
method better than the na¨ıve one?
3. Suppose you have a dataset {(x(1), y(1)), . . . , (x(N), y(N))} on which you wish to fit a linear regression model
y ≈ xTβ + v. Naturally, you try a least squares data fitting approach: define
A =
1 (x
(1))T
...
...
1 (x(N))T
 , b =
 y
(1)
...
y(N)

and solve
min
β,v
∥∥∥∥A [vβ
]
− b
∥∥∥∥2 .
Unfortunately, you find that the model does not fit well, so you explore approaches to improve it. One
approach is to define an extra feature z as a linear combination of current features x, so z = xT γ for some
fixed vector γ. Let x˜(i) = (x(i), z(i)) where z(i) = (x(i))T γ for all i = 1, . . . , N , and define
A˜ =
1 (x˜
(1))T
...
...
1 (x˜(N))T
 .
Page 1 of 2.
QBUS1040 Assignment 5 Due 27th May 2022
You then compare the previous model with the one obtained by solving
min
β˜,v
∥∥∥∥A˜ [vβ˜
]
− b
∥∥∥∥2 .
(a) (5 points) Do you expect the new model to perform better than the old one? Why or why not? Here,
“perform better” means that the sum of squared errors is lower.
(b) (5 points) What computational issues do you expect to arise when you solve the new model?
4. Estimating the elasticity matrix. In this problem you create a standard model of how demand varies with
the prices of a set of products, based on some observed data. There are n different products, with (positive)
prices given by the n-vector p. The prices are held constant over some period, say, a day. The (positive)
demand for the products over the day is given by the n-vector d. The demand in any particular day varies,
but it is thought to be (approximately) a function of the prices. The units of the prices and demands don’t
really matter in this problem. Demand could be measured in 10,000 units, and prices in $100.
The nominal prices are given by the n-vector pnom. You can think of these as the prices that have been
charged in the past for the products. The nominal demand is the n-vector dnom. This is the average value
of the demand, when the prices are set to pnom. (The actual daily demand fluctuates around the value
dnom.) You know both pnom and dnom. We will describe the prices by their (fractional) variations from the
nominal values, and the same for demands. We define δp and δd as the (vectors of) relative price change
and demand change:
δpi =
pi − pnomi
pnomi
, δdi =
di − dnomi
dnomi
, i = 1, . . . , n.
So δp3 = +0.05 means that the price for product 3 has been increased by 5% over its nominal value, and
δd5 = −0.04 means that the demand for product 5 in some day is 4% below its nominal value. Your task is
to build a model of the demand as a function of the price, of the form
δd ≈ Eδp,
where E is the n× n elasticity matrix.
You don’t know E, but you do have the results of some experiments in which the prices were changed a bit
from their nominal values for one day, and the day’s demands were recorded. This data has the form
(p1, d1), . . . , (pN , dN ),
where pi is the price for day i, and di is the observed demand.
(a) (10 points) Explain how you would estimate E, given this price-demand data. Be sure to explain how
you will test for, and (if needed) avoid over-fit.
Hint. You might find it easier to separately fit the models δdi ≈ e˜Ti δp, where e˜i is the ith row of E.
(We use the tilde above ei to avoid conflict with the notation for unit vectors.)
(b) (10 points) Please refer to the Jupyter Notebook file for this problem.
5. (20 points) Please refer to the Jupyter Notebook file for this problem.
6. Handwritten digit classification.
(a) (5 points) Please refer to the Jupyter Notebook file for this problem.
(b) (5 points) Please refer to the Jupyter Notebook file for this problem.
(c) (5 points) Please refer to the Jupyter Notebook file for this problem.
(d) (5 points) Note that your dataset is slightly different from the one we used in the lecture example,
although both come from the same source. In particular, you haven’t done any preprocessing to the
images. Hence your confusion matrices will look slightly differently. Comment on what else causes the
differences between your results and the results presented in the lecture (there is more to this story
than preprocessing).
Page 2 of 2.
essay、essay代写