<?xml version="1.0" encoding="utf-8"?> 
<rss version="2.0">
 <channel>
  <title>SHRIPHANI PALAKODETY: Posts tagged 'isomap'</title>
  <description>SHRIPHANI PALAKODETY: Posts tagged 'isomap'</description>
  <link>http://blog.shriphani.com/tags/isomap.html</link>
  <lastBuildDate>Thu, 22 Jan 2015 03:31:29 UT</lastBuildDate>
  <pubDate>Thu, 22 Jan 2015 03:31:29 UT</pubDate>
  <ttl>1800</ttl>
  <item>
   <title>Dimension Analysis: A Recap</title>
   <link>http://blog.shriphani.com/2015/01/21/dimension-analysis-a-recap/?utm_source=isomap&amp;utm_medium=RSS</link>
   <guid>urn:http-blog-shriphani-com:-2015-01-21-dimension-analysis-a-recap</guid>
   <pubDate>Thu, 22 Jan 2015 03:31:29 UT</pubDate>
   <description>&lt;html&gt;
&lt;p&gt;In the past few blog posts, I covered some details of popular dimension-reduction techniques and showed some common themes. In this post, I will collect all these materials and tie them together.&lt;/p&gt;
&lt;!-- more--&gt;

&lt;h3 id="dimension"&gt;Dimension?&lt;/h3&gt;

&lt;p&gt;The best definition I&amp;rsquo;ve seen for the topic comes from Benoit Mandelbrot&amp;rsquo;s work on &lt;a href="http://en.wikipedia.org/wiki/Fractal_Dimension"&gt;fractal geometry&lt;/a&gt;. The fractal dimension is associated with the ability of a pattern to fill space. Here&amp;rsquo;s a good example to illustrate what we mean.&lt;/p&gt;

&lt;p&gt;Consider a curve viewed at three different scales (image stolen from Chris Burges&amp;rsquo;s document on dimension reduction):&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/rectangle_example.png" /&gt;&lt;/p&gt;

&lt;p&gt;Now, at a microscopic level, we begin observing. How do we observe? Assume that a there&amp;rsquo;s a sphere around the observer. Now, let this sphere expand a bit. At the microscopic level, your sphere encounters more of the curve&amp;rsquo;s material in 2 dimensions. This is illustrated in the rightmost figure.&lt;/p&gt;

&lt;p&gt;Now, at a slightly different scale, when our sphere expands, we observe more material along just 1 dimension. This is illustrated in the middle figure.&lt;/p&gt;

&lt;p&gt;On a scale like the one in the leftmost pic, we encounter no material at all. This is akin to a zero-dimension figure (a point).&lt;/p&gt;

&lt;p&gt;An intuitive explanation of why scale matters is provided in this Wikipedia example. Using a ruler of different lengths, we obtain different measures for the coastline of Great Britain. At various levels of scale, we acquire various measures of the coastline - using a ruler that is as long as the diameter of the earth, the coastline of britain is a negligible fraction of our instrument.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://upload.wikimedia.org/wikipedia/commons/7/78/Britain-fractal-coastline-200km.png" /&gt; &lt;img src="http://upload.wikimedia.org/wikipedia/commons/c/c8/Britain-fractal-coastline-100km.png" /&gt; &lt;img src="http://upload.wikimedia.org/wikipedia/commons/f/f9/Britain-fractal-coastline-50km.png" /&gt;&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a neat formula that can be used to estimate the fractal dimension of a dataset:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;$ n $: The number of pairs of points in our data.&lt;/li&gt;
 &lt;li&gt;$ r $: The radius of a sphere centered around the observer.&lt;/li&gt;
 &lt;li&gt;$ p $: The number of pairs of points in a sphere of radius $ r $.&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The estimate of the fractal dimension is given by the slope of $ \log(p) $ vs $ \log(r) $.&lt;/p&gt;

&lt;p&gt;For the curve in the example above, this value is some real number between 1 and 2 (so the points on the curve have more freedom than those on a line but less freedom than those on a 2D-plane).&lt;/p&gt;

&lt;h3 id="a-first-stab-at-dimension-reduction"&gt;A First Stab at Dimension Reduction&lt;/h3&gt;

&lt;p&gt;Working with intuitions we developed in the first section, we can develop a greedy algorithm:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;Estimate the fractal-dimension of the dataset.&lt;/li&gt;
 &lt;li&gt;Choose a dimension (column) to drop, drop it and recompute the  fractal-dimension. If the dimension doesn&amp;rsquo;t change too much (stays  within a certain tolerance), consider this dimension dropped.&lt;/li&gt;
 &lt;li&gt;Repeat until no more dimensions can be dropped without significantly  altering the fractal-dimension.&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is the Grassberger-Procaccia algorithm.&lt;/p&gt;

&lt;p&gt;It is intuitive to grasp.&lt;/p&gt;

&lt;p&gt;However, &lt;strong&gt;in a high-dimension setting, our technique for estimating fractal dimensions falls apart&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a high-dimensional setting, pairwise distances between the points in a dataset are tightly clustered about a mean. Essentially, the points seem to be equidistant from each other. A Hoeffding bound is provided in &lt;a href="/2013/11/29/a-comment-on-dimension-estimation/"&gt;this blog post&lt;/a&gt; that illustrates this point.&lt;/p&gt;

&lt;h3 id="the-pca"&gt;The PCA&lt;/h3&gt;

&lt;p&gt;One common attempt at reducing dimensions is capturing directions of maximum variance. The PCA projects points in the dataset along the eigenvectors of the covariance matrix. Since this technique is well-known, I&amp;rsquo;ll just point to this &lt;a href="http://en.wikipedia.org/wiki/Principal_component_analysis"&gt;Wikipedia article&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id="from-proximities-to-datasets"&gt;From Proximities to Datasets&lt;/h3&gt;

&lt;p&gt;A family of techniques I like a lot operate on proximity matrices. A proximity matrix is a symmetric matrix containing similarity scores between the points in a dataset (thus this matrix contains $ n $ rows and $ n $ columns where $ n $ is the number of points in the dataset).&lt;/p&gt;

&lt;p&gt;A simple argument demonstrates that proximity matrices are gram matrices (a gram matrix is a close cousin of the covariance matrix). One can retrieve a collection of points for a given gram matrix - see &lt;a href="/2015/01/15/multidimensional-scaling-and-pca-are-the-same-thing/"&gt;this blog post&lt;/a&gt; for a proof.&lt;/p&gt;

&lt;p&gt;This family of techniques formulates the dimension-reduction problem as such: &amp;ldquo;Find a configuration of points in a lower-dimensional place that preserves the proximities in the proximity matrix&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The standard MDS algorithm uses euclidean distances between points to populate the proximity matrix. &lt;a href="/2014/10/29/low-dimension-embeddings-for-visualization/"&gt;This blog post&lt;/a&gt; contains more info about this algorithm.&lt;/p&gt;

&lt;p&gt;A variant of this algorithm uses path-weights in a $k$-NN graph. This is the Isomap algorithm - covered &lt;a href="/2014/11/12/the-isomap-algorithm/"&gt;in this post&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id="the-kernel-trick"&gt;The Kernel Trick&lt;/h3&gt;

&lt;p&gt;The Kernel trick is leveraged in settings where we transform our points to a higher-dimensional space to make the desired insight pop out. This desired insight is a hyperplane to separate two different classes when working with a classifier. In Kernel PCA, the desired insight is capturing variances so you can run a PCA on the newer dataset in a higher-dimension.&lt;/p&gt;

&lt;p&gt;Interestingly, MDS and Isomap are all variants of the Kernel PCA - a topic explored in &lt;a href="/2015/01/20/the-kernel-pca/"&gt;this blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id="up-next"&gt;Up Next&lt;/h3&gt;

&lt;p&gt;In future blog posts, I will discuss scaling issues with spectral algorithms, insights that can be transferred to other domains and so on.&lt;/p&gt;&lt;/html&gt;</description></item>
  <item>
   <title>The Isomap Algorithm</title>
   <link>http://blog.shriphani.com/2014/11/12/the-isomap-algorithm/?utm_source=isomap&amp;utm_medium=RSS</link>
   <guid>urn:http-blog-shriphani-com:-2014-11-12-the-isomap-algorithm</guid>
   <pubDate>Wed, 12 Nov 2014 21:24:54 UT</pubDate>
   <description>&lt;html&gt;
&lt;p&gt;&lt;em&gt;This is part of a series on a family of dimension-reduction algorithms called non-linear dimension reduction. The goal here is to reduce the dimensions of a dataset (i.e. discard some columns in your data)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In previous posts, I discussed the &lt;a href="http://blog.shriphani.com/2014/10/29/low-dimension-embeddings-for-visualization/"&gt;MDS algorithm&lt;/a&gt; and presented &lt;a href="http://blog.shriphani.com/2014/11/02/powerful-ideas-in-manifold-learning/"&gt;some key ideas&lt;/a&gt;. In this post, I will describe how those ideas are leveraged in the Isomap algorithm. A clojure implementation based on core.matrix is also included.&lt;/p&gt;
&lt;!-- more--&gt;

&lt;h3 id="intuition"&gt;Intuition&lt;/h3&gt;

&lt;p&gt;Isomap uses the same core ideas as the MDS algorithm:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;
  &lt;p&gt;Obtain a matrix of proximities (distances between points in a  dataset).&lt;/p&gt;&lt;/li&gt;
 &lt;li&gt;
  &lt;p&gt;This distance matrix is a matrix of inner products.&lt;/p&gt;&lt;/li&gt;
 &lt;li&gt;
  &lt;p&gt;An eigendecomposition of this matrix gives us the lower dimension  embedding.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Isomap differs from MDS in one vital way - the construction of the distance matrix. In MDS, the distance between two points is just the euclidean distance.&lt;/p&gt;

&lt;p&gt;In Isomap, the distances between points are the weight of the shortest path in a point-graph.&lt;/p&gt;

&lt;p&gt;The point graph is constructed by placing an edge between two points if the euclidean distance between them falls under a certain threshold or between a point and its top $ k $ neighbors.&lt;/p&gt;

&lt;p&gt;This distance matrix captures the underlying manifold more accurately than one constructed using euclidean distances. The following toy example demonstrates this:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/isomap_data.png" /&gt;&lt;/p&gt;

&lt;p&gt;The data shown here looks like a swirl that starts at point 1 and ends at point 9. We would like to recover this phenomenon in our lower-dimension embedding.&lt;/p&gt;

&lt;p&gt;The first step is to build a distance matrix. Say we use euclidean distances between two points as the corresponding entry in the distance matrix.&lt;/p&gt;

&lt;p&gt;In this figure, it is clear that &lt;code&gt;euclidean_distance(1, 6) = euclidean_distance(1, 8)&lt;/code&gt; and &lt;code&gt;euclidean_distance(1, 5) = euclidean_distance(1, 9)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Clearly the distances computed here miss the &amp;ldquo;swirl&amp;rdquo; in the data entirely. Working with the point graph mentioned above helps us get around this problem.&lt;/p&gt;

&lt;p&gt;Let us build a point graph by adding an edge between a node and its nearest neighbor (so $ 1-NN $). The weight on the edge is the euclidean distance between the nodes. The distance between two points is the weight of the shortest path between these points. The point graph is shown below:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/isomap_point_graph.png" /&gt;&lt;/p&gt;

&lt;p&gt;Observe that when we use this newer distance metric, &lt;code&gt;distance(1, 6)&lt;/code&gt; is indeed less than  distance(1, 8) and  distance(1, 5) is indeed less than &lt;code&gt;distance(1, 9)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This distance function is clearly doing a better job of capturing the &amp;ldquo;swirl&amp;rdquo; in the data.&lt;/p&gt;

&lt;p&gt;The Isomap algorithm uses a distance matrix constructed like this in place of one constructed with euclidean distances. This distance matrix is then plugged into the MDS framework and an eigendecomposition is run on the double-centered matrix.&lt;/p&gt;

&lt;h3 id="implementation"&gt;Implementation&lt;/h3&gt;

&lt;p&gt;Let us do a clojure implementation.&lt;/p&gt;

&lt;p&gt;We have a point-set as a &lt;code&gt;core.matrix&lt;/code&gt; matrix. First, we compute the point-graph. I am going to place edges between a point and its 3 nearest neighbors (so $ 3-NN $). This routines expects a map of the type {point-index point-vector, &amp;hellip;}&lt;/p&gt;

&lt;div class="brush: clojure"&gt;
 &lt;table class="sourcetable"&gt;
  &lt;tbody&gt;
   &lt;tr&gt;
    &lt;td class="linenos"&gt;
     &lt;div class="linenodiv"&gt;
      &lt;pre&gt; 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
    &lt;td class="code"&gt;
     &lt;div class="source"&gt;
      &lt;pre&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;defn &lt;/span&gt;&lt;span class="nv"&gt;build-point-graph&lt;/span&gt;
  &lt;span class="s"&gt;"A point graph is a k-NN graph. Edges between&lt;/span&gt;
&lt;span class="s"&gt;   a point and its 3 nearest neigbors"&lt;/span&gt;
  &lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;build-point-graph&lt;/span&gt; &lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="nv"&gt;num-neighbors&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="nv"&gt;pt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;other-points&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;
                            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                              &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;not= &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;first &lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;first &lt;/span&gt;&lt;span class="nv"&gt;pt&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
                            &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
          &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;
           &lt;span class="nv"&gt;acc&lt;/span&gt;
           &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nb"&gt;first &lt;/span&gt;&lt;span class="nv"&gt;pt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;
                        &lt;span class="nv"&gt;first&lt;/span&gt;
                        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;take &lt;/span&gt;&lt;span class="nv"&gt;num-neighbors&lt;/span&gt;
                              &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sort-by&lt;/span&gt;
                               &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;distance&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;first &lt;/span&gt;&lt;span class="nv"&gt;pt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                                          &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;first &lt;/span&gt;&lt;span class="nv"&gt;%&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                               &lt;span class="nv"&gt;other-points&lt;/span&gt;&lt;span class="p"&gt;)))})))&lt;/span&gt;
      &lt;span class="p"&gt;{}&lt;/span&gt;
      &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Then a simple Floyd Warshall algorithm implementation that computes the weights on the shortest paths. It takes the graph built in the previous step and the original indexed points and builds the graph.&lt;/p&gt;

&lt;div class="brush: clojure"&gt;
 &lt;table class="sourcetable"&gt;
  &lt;tbody&gt;
   &lt;tr&gt;
    &lt;td class="linenos"&gt;
     &lt;div class="linenodiv"&gt;
      &lt;pre&gt; 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
    &lt;td class="code"&gt;
     &lt;div class="source"&gt;
      &lt;pre&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;defn &lt;/span&gt;&lt;span class="nv"&gt;floyd-warshall-distance&lt;/span&gt;
  &lt;span class="s"&gt;"Expected graph representation:&lt;/span&gt;
&lt;span class="s"&gt;    {V -&amp;gt; neighboring-points}"&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;a-graph&lt;/span&gt; &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;indexed-points-dict&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;into &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;edges&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;
                  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt; &lt;span class="nv"&gt;neighbors&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;concat &lt;/span&gt;&lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt; &lt;span class="nv"&gt;n&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                                     &lt;span class="nv"&gt;neighbors&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
                  &lt;span class="p"&gt;[]&lt;/span&gt;
                  &lt;span class="nv"&gt;a-graph&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nv"&gt;inf-matrix&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+ &lt;/span&gt;&lt;span class="nv"&gt;Double/POSITIVE_INFINITY&lt;/span&gt;
                      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zero-matrix&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                                   &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

        &lt;span class="nv"&gt;zero-diag&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;
                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mset&lt;/span&gt; &lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="nv"&gt;inf-matrix&lt;/span&gt;
                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;-&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;range&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="nv"&gt;weights-init&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;
                      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;acc&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
                        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mset&lt;/span&gt; &lt;span class="nv"&gt;acc&lt;/span&gt;
                              &lt;span class="nv"&gt;x&lt;/span&gt;
                              &lt;span class="nv"&gt;y&lt;/span&gt;
                              &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;distance&lt;/span&gt;
                               &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;indexed-points-dict&lt;/span&gt; &lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                               &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;indexed-points-dict&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;
                      &lt;span class="nv"&gt;zero-diag&lt;/span&gt;
                      &lt;span class="nv"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;k&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;&amp;lt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+ &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="nv"&gt;k&lt;/span&gt; &lt;span class="nv"&gt;j&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
              &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;j&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
         &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mset&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt;
               &lt;span class="nv"&gt;i&lt;/span&gt;
               &lt;span class="nv"&gt;j&lt;/span&gt;
               &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+ &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt; &lt;span class="nv"&gt;old-distances&lt;/span&gt; &lt;span class="nv"&gt;k&lt;/span&gt; &lt;span class="nv"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
         &lt;span class="nv"&gt;old-distances&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
     &lt;span class="nv"&gt;weights-init&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;for &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;k&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;-&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;range&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;-&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;range&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="nv"&gt;j&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;-&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="nb"&gt;count &lt;/span&gt;&lt;span class="nv"&gt;range&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;k&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]))))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Once we have a distance matrix, we can simply feed it to MDS:&lt;/p&gt;

&lt;div class="brush: clojure"&gt;
 &lt;table class="sourcetable"&gt;
  &lt;tbody&gt;
   &lt;tr&gt;
    &lt;td class="linenos"&gt;
     &lt;div class="linenodiv"&gt;
      &lt;pre&gt;1
2
3
4
5
6
7&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
    &lt;td class="code"&gt;
     &lt;div class="source"&gt;
      &lt;pre&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;defn &lt;/span&gt;&lt;span class="nv"&gt;isomap&lt;/span&gt;
  &lt;span class="s"&gt;"Takes indexed-points and the target dimension"&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;points&lt;/span&gt; &lt;span class="nv"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;indexed-points&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;map-indexed&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="nv"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;graph&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;build-point-graph&lt;/span&gt; &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;distances&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;floyd-warshall-distance&lt;/span&gt; &lt;span class="nv"&gt;graph&lt;/span&gt; &lt;span class="nv"&gt;indexed-points&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mds/distances-&amp;gt;points&lt;/span&gt; &lt;span class="nv"&gt;distances&lt;/span&gt; &lt;span class="nv"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;And that&amp;rsquo;s it!&lt;/p&gt;

&lt;h3 id="examples"&gt;Examples&lt;/h3&gt;

&lt;p&gt;I will use word-vectors from word2vec for these 10 words: &lt;code&gt;river
lake
city
town
actor
doctor
dog
cat
animal
home&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The word vectors for these words are available in &lt;a href="/img/foo.csv"&gt;foo.csv&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let us reduce these to two dimensions. We get:&lt;/p&gt;

&lt;h4 id="isomap-embeddings"&gt;ISOMAP Embeddings&lt;/h4&gt;

&lt;iframe src="https://docs.google.com/spreadsheets/d/1zb8WGpVov_aYSsWREG9gsjYKv1KB-kPQUNfRHmW9LLY/pubchart?oid=1644191968&amp;amp;amp;format=interactive" style="width:100%;height:50vh;"&gt;&lt;/iframe&gt;

&lt;p&gt;The embeddings produced by the MDS algorithm are:&lt;/p&gt;

&lt;h4 id="mds-embeddings"&gt;MDS Embeddings&lt;/h4&gt;

&lt;iframe src="https://docs.google.com/spreadsheets/d/1yDeqAtGh_7kHvmsdLGsCzFMwFno1iJ9fn7zVqkToVoI/pubchart?oid=762847171&amp;amp;amp;format=interactive" style="width:100%;height:50vh;"&gt;&lt;/iframe&gt;

&lt;p&gt;Compared to the plot produced by MDS, we have more separation between terms - for instance &lt;code&gt;cat&lt;/code&gt; and &lt;code&gt;dog&lt;/code&gt; are place close by but they don&amp;rsquo;t overlap (unlike the MDS plot). This is a qualitative analysis, it is pretty hard to gauge which embedding is better.&lt;/p&gt;

&lt;h3 id="full-source"&gt;Full Source&lt;/h3&gt;

&lt;p&gt;See this repo: &lt;a href="https://github.com/shriphani/clojure-manifold"&gt;https://github.com/shriphani/clojure-manifold&lt;/a&gt;&lt;/p&gt;&lt;/html&gt;</description></item>
  <item>
   <title>Powerful Ideas in Manifold Learning</title>
   <link>http://blog.shriphani.com/2014/11/02/powerful-ideas-in-manifold-learning/?utm_source=isomap&amp;utm_medium=RSS</link>
   <guid>urn:http-blog-shriphani-com:-2014-11-02-powerful-ideas-in-manifold-learning</guid>
   <pubDate>Sun, 02 Nov 2014 10:10:47 UT</pubDate>
   <description>&lt;html&gt;
&lt;p&gt;In a &lt;a href="/2014/10/29/low-dimension-embeddings-for-visualization/"&gt;previous post&lt;/a&gt;, I described the MDS (multidimensional scaling) algorithm. This algorithm operates on a proximity matrix which is a matrix of distances between the points in a dataset. From this matrix, a configuration of points is retrieved in a lower dimension.&lt;/p&gt;

&lt;p&gt;The MDS strategy is:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;We have a matrix $ D $ for distances between points in the data.  This matrix is &lt;em&gt;symmetric&lt;/em&gt;.&lt;/li&gt;
 &lt;li&gt;We express distances as dot-products (using a proof from Schonberg).  This means that $ D $ is expressed as $ X^T X $. (Observe that $  X^T X $ is a matrix of dot-products).&lt;/li&gt;
 &lt;li&gt;Once we have $ X^T X $, dimension reduction is trivial. Running an  eigendecomposition on this matrix will produced centered  coordinates. The low-dimension embedding is recovered by discarding  eigenvalues (eigenvectors).&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Thus, assuming that we work with &lt;em&gt;euclidean distances&lt;/em&gt; between points, we retrieve an embedding that PCA itself would produce. Thus, &lt;strong&gt;MDS with euclidean distances is identical to PCA&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Then what exactly is the value of running MDS on a dataset?&lt;/p&gt;

&lt;p&gt;First, the PCA is not the most powerful approach. For certain datasets, euclidean distances do not capture the shape of the underlying manifold. Running the steps of the MDS on a different distance matrix (at least one that doesn&amp;rsquo;t contain euclidean distances) can lead to better results - a technique that the &lt;a href="http://en.wikipedia.org/wiki/Isomap"&gt;Isomap algorithm&lt;/a&gt; exploits.&lt;/p&gt;

&lt;p&gt;Second, the PCA requires a vector-representation for points. In several situations, the objects in the dataset are not points in a metric space (like strings). We can retrieve distances between objects (say edit-distance for strings) and then obtain a vector-representation for the objects using MDS.&lt;/p&gt;

&lt;p&gt;In the next blog post, I will describe and implement the Isomap algorithm that leverages the ideas in the MDS strategy. Isomap constructs a distance matrix that attempts to do a better job at recovering the underlying manifold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PROOFS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;&lt;em&gt;Distances are dot-products:&lt;/em&gt;  &lt;a href="http://www.math.pku.edu.cn/teachers/yaoy/Fall2011/lecture11.pdf"&gt;These notes from Peking U&lt;/a&gt;  are easy to follow. I have mirrored them &lt;a href="/img/mds_proof.pdf"&gt;here&lt;/a&gt; in case that link 404s.&lt;/li&gt;&lt;/ul&gt;&lt;/html&gt;</description></item></channel></rss>