COMP3670/6670-comp3670代写-Assignment 4
时间:2023-10-21
The Australian National University Semester 2, 2023
School of Computing Assignment 4 Theory Questions
COMP3670/6670: Introduction to Machine Learning
Release Date. October 5, 2023
Due Date. 11:59 pm, October 23, 2023
Maximum credit. 100
Question 1 Properties of Eigenvalues (10 + 5 = 15 credits)
Let A be an invertible matrix.
1. (a) Prove that all the eigenvalues of A are non-zero.
(b) Prove that for any eigenvalue λ of A, λ−1 is an eigenvalue of A−1.
2. Let B be a square matrix. Let x be an eigenvector of B with eigenvalue λ. Prove that for all
integers n ≥ 1, x is an eigenvector of Bn with eigenvalue λn.
Question 2 Distinct eigenvalues and linear independence (10 + 5 = 15 credits)
Let A be a n× n matrix.
1. Suppose that A has n distinct eigenvalues λ1, . . . , λn, and corresponding non-zero eigenvectors
x1, . . . ,xn. Prove that {x1, . . . ,xn} are linearly independent.
Hint: You may use without proof the following property: If {y1, . . . ,ym} are linearly dependent
then there exists some p such that 1 ≤ p < m, yp+1 ∈ span{y1, . . . ,yp} and {y1, . . . ,yp} are
linearly independent.
2. Hence, or otherwise, prove that for any matrix B ∈ Rn×n, there can be at most n distinct
eigenvalues for B.
Question 3 Determinants (5 + 5 + 5 + 5 + 5 = 25 credits)
1. Compute the determinant of matrix A using the Sarus rule
A =

1 2 4
2 3 −2
0 −1 −2

2. Prove det(AT ) = det(A).
1
3. Prove det(In) = 1 where In is the n× n identity matrix.
4. Prove det(A) = −det(σi,j(A)) for i ̸= j where σi,j swaps the i’th and j’th row of the input
matrix.
5. Let U be an square n × n upper triangular matrix. Prove that the determinant of U is equal
to the product of the diagonal elements of U.
Question 4 Trace Inequality (10 credits)
Prove for arbitrary square matrix A and B,
tr
(
(A+B)(A+B)T
) ≤ 2 · tr(AAT +BBT )
Question 5 Computations with Eigenvalues (3 + 3 + 3 + 3 + 3 = 15 credits)
Let A =
[
2 −2
0 1
]
.
1. Compute the eigenvalues of A.
2. Find the eigenspace Eλ for each eigenvalue λ. Write your answer as the span of a collection of
vectors.
3. Verify the set of all eigenvectors of A spans R2.
4. Hence, find an invertable matrix P and a diagonal matrix D such that A = PDP−1.
5. Hence, find a formula for efficiently 1 calculating An for any integer n ≥ 0. Make your formula
as simple as possible.
Question 6 PCA as an optimisation problem (20 credits)
Principal component analysis (PCA) is a technique for increasing the interpretability of data of
datasets with a large number of dimensions/features per observation while preserving the maximum
amount of information. Formally, PCA is a statistical technique for reducing the dimensionality of a
dataset. This is accomplished by linearly transforming the data into a new coordinate system where
(most of) the variation in the data can be described with fewer dimensions than the initial data. 2
A common technique used for finding a local max/min of a function f(x) subject to an equality
constraint of the form g(x) = c, is using a Lagrange multiplier. Na¨ıvely, to find a local min or max we
would start by differentiating f(x) to get dfdx and then solve for
df
dx
= 0
1That is, a closed form formula for An as opposed to multiplying A by itself n times over.
2https://en.wikipedia.org/wiki/Principal_component_analysis
2
But this does not take into account the constraint we have imposed. In order to account for this, we
instead create the expression, called the Lagrangian
L(x, λ) = f(x)− λ(g(x)− c)
Then to get the local min/max, we differentiate this expression with respect to both x and λ and set
both equal to zero. So we have the equations,
∂L
∂x
= 0
∂L
∂λ
= 0
Solving the simultaneous equation given by setting these two expressions equal to zero, gives the local
minimum/maximum of f(x), subject to the constraint g(x) = c. (You can read more into Lagrange
multipliers if you want but it is not needed for the question)
In this question, our goal is to motivate why the vectors chosen for PCA (the principal components)
are eigenvectors of the covariance matrix. Suppose we have a dataset D = {x1, x2, ..., xn} of vectors
with mean centered at 0 (for simplicity, we can always adjust by subtracting the mean if needed).
For a given vector v, the result of projecting a vector xi onto v is given by v
⊺xi. The variance of the
projected data is given by 1n
∑n
i=1(v
⊺xi−µ)2 where µ is the mean of the projected data. Since we are
assuming the mean of the data is 0, this simplifies to 1n
∑n
i=1(v
⊺xi)2. Our goal is to find the vector v
that maximises this variance.
V = 1
n
n∑
i=1
(v⊺xi)
2 =
1
n
n∑
i=1
v⊺xix

i v
= v⊺
(
1
n
n∑
i=1
xix

i
)
v
= v⊺Cv
Use the Lagrangian method to show that the vector v that maximises the resulting variance V = v⊺Cv
subject to the constraint ∥v∥2 = 1 is an eigenvector of the covariance matrix C.
essay、essay代写