AIML427-machine learning big data代写|学霸联盟

AIML427-machine learning big data代写

时间：2023-04-23

Big Data
AIML427
Manifold Learning/
Nonlinear Dimensionality Reduction (1)
Dr Bach Nguyen
(with thanks to Dr Andrew Lensen)
Bach.Nguyen@vuw.ac.nz
1
Week 5:2
Dimensionality Reduction
2
Original
Dataset
N
#features = dimension D ≪ D
“Embedding”
(Nonlinear)
Dimensionality Reduction
Week 5:3How do we do the (NL)DR?
• We’ve discussed a lot of supervised methods
– Filter: information gain, mutual information, correlation, …
– Wrapper: Training a classifier given a feature subset
– Embedded: Decision Trees, Genetic Programming
• But what about if you have no labels?
• Or what if you want a (filter) method that considers the entire
set of feature relationships?
– Pairwise redundancies feel “naïve”
– What about the topology of the data?
• Let us make an “assumption”: our data is actually
(intrinsically) lower-dimensional than the D features we have
– We say it lies on a lower-dimensional manifold (embedding)
3
Week 5:4Swiss Rolls
4
This Photo by Unknown Author is licensed under CC BY-NC-ND
Week 5:5…mathematically speaking
5
3D
2D
Did we lose any structure?
Nonlinear
Transformation
Week 5:6
Well…
• The red and blue points were close in Euclidean (straight-
line) space in 3D – but aren’t in 2D
• …but this is desirable, as there is a “gap” in the topology
• We have preserved the geodesic distance
– Geodesic Distance = shortest path in a graph
– E.g. distance from Auckland to London flying vs through the Earth
6
Week 5:7
Geodesic Distance
7
Week 5:8
Manifolds
• In mathematics, a manifold is a topological space that locally
resembles Euclidean space near each point.
– Small portions of a circle look like a line
– “Small” portions of the Earth look flat
– Topology ignores bending
8
Week 5:9I thought this was an AI class…
• It is!
• The key takeaway: a manifold is a d-dimensional object
“living” in a D-dimensional world.
• If we can find or approximate this manifold, we can
represent our data in d dimensions instead of D!
• We call this d-dimensional space an embedding – it is
embedded within the D-dimensional space.
9
Week 5:10
Manifold Learning
• We want to find a smaller embedding of our data
• Manifold learning (MaL): using machine learning to learn an
embedding
• Nonlinear dimensionality reduction (NLDR): a broader term:
any approach to reducing dimensionality through nonlinear
transformations
• In practice, the two terms are used pretty interchangeably
10
Week 5:11
Manifold Learning
• How do we know our data lies on a manifold?
• Ways to estimate the intrinsic dimensionality of data
• In practice, an approximation of the original topology is
perfectly useable.
• As with PCA – if we retain the majority of the
variance/structure, we likely retain the important patterns
• Feature construction? Yes…and no. We’ll come back to that
11
Week 5:12
“CLASSIC” MANIFOLD
LEARNING ALGORITHMS
Week 5:13
Multidimensional Scaling (MDS)
• Dates from as early as 1938 (!)
• Minimises the difference between the pairwise distance
(between instances) in the high-dimensional vs embedding
space:
! = &"#$[ , − %(, )]&
where k and l are two instances, , is the distance in
embedded space and %(, ) the distance in high-dim
space
13
Week 5:14
Multidimensional Scaling (MDS)
• Nonmetric MDS uses ranks instead of raw distance (why?)
• Sammon’s Mapping normalises each distance (why?)
• How does it find the embedding?
– Optimisation problem…
– Numerical optimisation techniques
– Or Eigendecomposition…
14
Week 5:15
Isomap
15
Tenenbaum, J. B., Silva, V. D., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality
reduction. Science, 290(5500), 2319-2323. https://www.science.org/doi/10.1126/science.290.5500.2319
1. Construct neighbourhood graph (connect , , < )
2. Compute shortest paths (geodesic distance matrix)
3. Construct d-dimensional embedding (eigenvectors of matrix)
Week 5:16
Locally Linear Embedding (LLE)
16
1. Select (K) neighbours
for each datapoint !
2. Compute weights !" to reconstruct !
from its neighbours
(least-squares)
3. Do some fancy
linear algebra to find
the embedding that
best minimises the
reconstruction error.
NB !" is unchanged
Week 5:17
Types of Manifold Learning
• Mapping vs non-mapping methods
– Embedding = f(high-dim space)? “See” the functional mapping; vs
– Embedding is optimised “with respect to” high-dim space
• Local vs global methods
– Preserve local neighbourhoods more; vs
– Ensure global structure is maintained
– (Both?)
• Deterministic/analytical vs (stochastic) search
– Eigendecomposition/numeric optimisation; vs
– Gradient descent/EC/…
17
Week 5:18
PCA vs NLDR?
18
PCA
Manifold
Sculpting
Week 5:19
19
https://johnhw.github.io/umap_primes/index.md.html
One million integers embedded into 2D space with UMAP
We’ll meet UMAP
tomorrow!