SHRIPHANI PALAKODETY: Posts tagged 'dimensions-reduction'SHRIPHANI PALAKODETY: Posts tagged 'dimensions-reduction'
http://blog.shriphani.com/tags/dimensions-reduction.html
Wed, 21 Jan 2015 02:28:09 UTWed, 21 Jan 2015 02:28:09 UT1800The Kernel PCA
http://blog.shriphani.com/2015/01/20/the-kernel-pca/?utm_source=dimensions-reduction&utm_medium=RSS
urn:http-blog-shriphani-com:-2015-01-20-the-kernel-pcaWed, 21 Jan 2015 02:28:09 UT<html>
<p>The Kernel PCA is an extension of the PCA algorithm. In particular, we desire to (i) transform our existing dataset to another high-dimensional space and then (ii) perform PCA on the data in that space.</p>
<p>In this blog post, I will perform a very quick, non-rigorous overview of Kernel PCA and demonstrate some connections between other forms of dimension-reduction.</p>
<p>Why would this be useful? Consider this dataset (stolen from the scikit-learn documentation):</p>
<p><img src="http://scikit-learn.org/stable/_images/plot_kernel_pca_001.png" /></p>
<p>As we can see, the original PCA projection is fairly useless. Applying a kernel produces a much better projection.</p>
<p>Like kernels in several problems, the trick is to avoid transforming data-points and leveraging dot-products.</p>
<p>The technique is as follows:</p>
<p>$ \Phi $ is a mapping from our existing point-space to a higher-dimensional space $ \mathcal{F} $.</p>
<p>After a certain amount of linear algebra, the PCA in space $ \mathcal{F} $ can be expressed as a PCA on the <a href="http://en.wikipedia.org/wiki/Kernel_method">kernel matrix</a>.</p>
<p>So, the algorithm is expressible as follows:</p>
<ul>
<li>
<p>Compute the kernel matrix $ K $ where $ K_{ij} = \Phi(x_i) \cdot \Phi(x_j) $.</p></li>
<li>
<p>This matrix is efficiently constructed since we can obtain the constituent dot products in the original space.</p></li>
<li>
<p>The SVD of $ K $ gives you $ USV^{T} $ - $ U $ and $ S $ can be used to construct a reduced dimension dataset.</p></li></ul>
<p>Now that the intro is out of the way, I wanted to demonstrate some simple connections between algorithms I’ve covered recently:</p>
<h3 id="mds-with-euclidean-distances">MDS with Euclidean Distances</h3>
<p>In <a href="http://blog.shriphani.com/2015/01/15/multidimensional-scaling-and-pca-are-the-same-thing/">previous blog posts</a>, we covered that MDS and PCA are equivalent. A simple proof exists to show that MDS and Kernel PCA are the same thing:</p>
<ul>
<li>The proximity matrix in the MDS algorithm is built with distances. We can express distances between vectors $ x $ and $ y $ as:</li></ul>
<p>
<div>$$ d(x,y) = x \cdot x + y \cdot y – 2x \cdot y $$</div></p>
<ul>
<li>Thus, distances can be expressed as a kernel. The upshot of this is that the MDS algorithm itself is an instance of Kernel PCA.</li></ul>
<h3 id="isomap">Isomap</h3>
<p>The Isomap algorithm (covered in a <a href="http://blog.shriphani.com/2014/11/12/the-isomap-algorithm/">previous post</a>) trades the Euclidean distance with edge weights in a nearest neighbor graph. The entries in this proximity matrix are surrogates for distances and thus the Isomap algorithm is an instance of Kernel PCA as well.</p>
<p>It is amazing how several different approaches to dimension-reduction are variants of a single theme.</p></html>