matlab/Python代写-W6|学霸联盟

matlab/Python代写-W6

时间：2022-05-08

Prac W6 - Dimensionality Reduction using Principal
Component Analysis and t-SNE
COMP4702/COMP7703 - Machine Learning
Aims:
• To complement lecture material in understanding the operation of PCA.
• To complement lecture material in understanding the operation of t-SNE.
• To gain experience with simulating and implementing these techniques in software.
• To produce some assessable work for this subject.
Part I: Principal Component Analysis:
PCA can be implemented very simply in Matlab or python. Given a dataset (as a matrix
X), the covariance matrix can be found using the Matlab covariance function (cov(X)).
Then, the eigenvectors and eigenvalues of this covariance matrix are the principal component
(vectors) and principal values respectively. The eigenvalues reflect the amount of variance
accounted for by each principal component and are ordered. To perform dimensionality
reduction (e.g. down to 2 dimensions), we need to multiply X by the two eigenvectors with
the largest corresponding eigenvalues.
• Read section 6.3 of the Alpaydin text.
• Write a Matlab or python function implementing PCA.
(Q1) List the code you have written that implements PCA.
(Q2) Run your PCA function on the MNIST data.
(a) Produce a plot of the data in the space spanned by the first two principal com-
ponents. Colour each point by its class.
(b) What percentage of the data variance is accounted for by the first two principal
components?
(c) From the results, produce a Scree graph similar to that shown in Fig 6.2 of the
Alpaydin text.
(Q3) Repeat the procedure in (Q2) using the Swiss roll dataset. Comment briefly on the
results (a few sentences).
1
Part II: t-SNE:
Suppose we are given the task of placing objects existing in high dimensional space Rd
into a low dimensional space Rl so that the neighbours of similar objects in high dimensional
space are preserved in the new low dimensional space. Stochastic Neighbour Embedding
(SNE) is an algorithm that accomplishes this. For each object xi in Rd, the conditional
probability that xi would pick xj as its neighbour is calculated according to a Gaussian
distribution. Similarly for each object yi in Rl, the conditional probability that yi would
pick yj as its neighbour is calculated according to a Gaussian distribution. The algorithm
proceeds to minimise the sum of KL-divergences over the probability distributions in Rl and
Rd. As we know from previous practicals, the KL-divergence is a measure for how diffferent
two distributions are, so minimising it is intuitively motivated. Gradient Descent is the
algorithm that is used to minimise the KL-divergence.
• Read the original paper on t-SNE available on blackboard at Learning Resources >
Articles (for COMP7703 Reading Assignments) > Week 7: van der Maaten
and G. Hinton, Visualizing data using t-SNE.
• Download the Matlab implementation of t-SNE available at
https://lvdmaaten.github.io/tsne/. A python version is also available and you can use
it if you wish, but refer to the Matlab version for some of the questions below. Also
download the MNIST data in .mat format.
• Run the t-SNE algorithm on 6000 datapoints from the MNIST dataset:
1 clear all; close all; clc;
2 load(’mnist_train.mat’);
3 idx = unidrnd(60000, 6000, 1);
4 x = train_X(idx, :);
5 labels = train_labels(idx);
6 tsne(x, labels, 2, 30, 30)
(Q4) In one or two sentences, explain how t-SNE differs from SNE.
(Q5) In one or two sentences, explain lines 3, 4 and 5 in the code snippet above.
(Q6) Provide a screenshot of the 2-dimensional visualisation after 300 iterations. Plot the
error at each iteration up to 300 iterations in steps of 10.
(Q7) In 3-4 sentences, explain lines 51-53 and lines 87-89 in tsne p.m in relation to any
features you observe in your plot. Why has the code been written in this way? (Hint
- read the paper).
(Q8) Comment out lines 51-53 and 87-89. Run t-SNE for 300 iterations for perplexity values
ranging from 10 to 300 in steps of 10. Produce a 3D plot with perplexity and iterations
on the horizontal axis and cost on the vertical axis. Think of and explain a simple but
appropriate heurstic for choosing the perplexity. Provide the visualisation in 2D space
after 300 iterations for your chosen perplexity and compare this with your result in
(Q6).
2
(Q9) Run t-SNE (without PCA as a preprocessing step) on the Swiss roll dataset. Comment
on the visualisation it produces.
Datasets:
MNIST is a very widely used dataset of handwritten digit images. You will find the data on
the course site (this version from https://lvdmaaten.github.io/tsne/). Use the training set
only. The original source of the data is: http://yann.lecun.com/exdb/mnist/
The Swiss Roll data set is on the course site. It was obtained from:
http://web.mit.edu/cocosci/isomap/isomap.html
Useful Matlab Commands:
The following commands may be useful when writing code to answer the questions in this
prac.
eig(), eigs(), mean(), cov(), repmat(), diag(), surf()
Additional Resources:
• How to use t-SNE Effectively - http://distill.pub/2016/misread-tsne/
One of the first articles on the new research platform called Distill.
• Gradient-Based Optimization - Chapter 4.3 of Deep Learning available online
http://www.deeplearningbook.org/contents/numerical.html
We will be revisiting gradient descent later when training neural networks.
3