This is part of a series on a family of dimensionreduction algorithms called nonlinear dimension reduction. The goal here is to reduce the dimensions of a dataset (i.e. discard some columns in your data)
In previous posts, I discussed the MDS algorithm and presented some key ideas. In this post, I will describe how those ideas are leveraged in the Isomap algorithm. A clojure implementation based on core.matrix is also included.
Intuition
Isomap uses the same core ideas as the MDS algorithm:

Obtain a matrix of proximities (distances between points in a dataset).

This distance matrix is a matrix of inner products.

An eigendecomposition of this matrix gives us the lower dimension embedding.
Isomap differs from MDS in one vital way  the construction of the distance matrix. In MDS, the distance between two points is just the euclidean distance.
In Isomap, the distances between points are the weight of the shortest path in a pointgraph.
The point graph is constructed by placing an edge between two points if the euclidean distance between them falls under a certain threshold or between a point and its top $ k $ neighbors.
This distance matrix captures the underlying manifold more accurately than one constructed using euclidean distances. The following toy example demonstrates this:
The data shown here looks like a swirl that starts at point 1 and ends at point 9. We would like to recover this phenomenon in our lowerdimension embedding.
The first step is to build a distance matrix. Say we use euclidean distances between two points as the corresponding entry in the distance matrix.
In this figure, it is clear that euclidean_distance(1, 6) = euclidean_distance(1, 8)
and euclidean_distance(1, 5) = euclidean_distance(1, 9)
.
Clearly the distances computed here miss the “swirl” in the data entirely. Working with the point graph mentioned above helps us get around this problem.
Let us build a point graph by adding an edge between a node and its nearest neighbor (so $ 1NN $). The weight on the edge is the euclidean distance between the nodes. The distance between two points is the weight of the shortest path between these points. The point graph is shown below:
Observe that when we use this newer distance metric, distance(1, 6)
is indeed less than distance(1, 8) and distance(1, 5) is indeed less than distance(1, 9)
This distance function is clearly doing a better job of capturing the “swirl” in the data.
The Isomap algorithm uses a distance matrix constructed like this in place of one constructed with euclidean distances. This distance matrix is then plugged into the MDS framework and an eigendecomposition is run on the doublecentered matrix.
Implementation
Let us do a clojure implementation.
We have a pointset as a core.matrix
matrix. First, we compute the pointgraph. I am going to place edges between a point and its 3 nearest neighbors (so $ 3NN $). This routines expects a map of the type {pointindex pointvector, …}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
(defn buildpointgraph "A point graph is a kNN graph. Edges between a point and its 3 nearest neigbors" ([indexedpoints] (buildpointgraph indexedpoints 3)) ([indexedpoints numneighbors] (reduce (fn [acc pt] (let [otherpoints (filter (fn [x] (not= (first x) (first pt))) indexedpoints)] (merge acc {(first pt) (map first (take numneighbors (sortby #(distance (first pt) (first %)) otherpoints)))}))) {} indexedpoints))) 
Then a simple Floyd Warshall algorithm implementation that computes the weights on the shortest paths. It takes the graph built in the previous step and the original indexed points and builds the graph.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 
(defn floydwarshalldistance "Expected graph representation: {V > neighboringpoints}" [agraph indexedpoints] (let [indexedpointsdict (into {} indexedpoints) edges (reduce (fn [acc [x neighbors]] (concat acc (map (fn [n] [x n]) neighbors))) [] agraph) infmatrix (+ Double/POSITIVE_INFINITY (zeromatrix (count indexedpoints) (count indexedpoints))) zerodiag (reduce (fn [acc i] (mset acc i i 0)) infmatrix (> indexedpoints count range)) weightsinit (reduce (fn [acc [x y]] (mset acc x y (distance (indexedpointsdict x) (indexedpointsdict y)))) zerodiag edges)] (reduce (fn [olddistances [k i j]] (if (< (+ (mget olddistances i k) (mget olddistances k j)) (mget olddistances i j)) (mset olddistances i j (+ (mget olddistances i k) (mget olddistances k j))) olddistances)) weightsinit (for [k (> indexedpoints count range) i (> indexedpoints count range) j (> indexedpoints count range)] [k i j])))) 
Once we have a distance matrix, we can simply feed it to MDS:
1 2 3 4 5 6 7 
(defn isomap "Takes indexedpoints and the target dimension" [points n] (let [indexedpoints (mapindexed (fn [i x] [i x]) points) graph (buildpointgraph indexedpoints) distances (floydwarshalldistance graph indexedpoints)] (mds/distances>points distances n))) 
And that’s it!
Examples
I will use wordvectors from word2vec for these 10 words: river
lake
city
town
actor
doctor
dog
cat
animal
home
The word vectors for these words are available in foo.csv.
Let us reduce these to two dimensions. We get:
ISOMAP Embeddings
The embeddings produced by the MDS algorithm are:
MDS Embeddings
Compared to the plot produced by MDS, we have more separation between terms  for instance cat
and dog
are place close by but they don’t overlap (unlike the MDS plot). This is a qualitative analysis, it is pretty hard to gauge which embedding is better.
Full Source
See this repo: https://github.com/shriphani/clojuremanifold