SHRIPHANI PALAKODETY: Posts tagged 'dimensions-reduction' urn:http-blog-shriphani-com:-tags-dimensions-reduction-html 2015-01-21T02:28:09Z The Kernel PCA urn:http-blog-shriphani-com:-2015-01-20-the-kernel-pca 2015-01-21T02:28:09Z 2015-01-21T02:28:09Z SHRIPHANI PALAKODETY <html> <p>The Kernel PCA is an extension of the PCA algorithm. In particular, we desire to (i) transform our existing dataset to another high-dimensional space and then (ii) perform PCA on the data in that space.</p> <p>In this blog post, I will perform a very quick, non-rigorous overview of Kernel PCA and demonstrate some connections between other forms of dimension-reduction.</p> <p>Why would this be useful? Consider this dataset (stolen from the scikit-learn documentation):</p> <p><img src="http://scikit-learn.org/stable/_images/plot_kernel_pca_001.png" /></p> <p>As we can see, the original PCA projection is fairly useless. Applying a kernel produces a much better projection.</p> <p>Like kernels in several problems, the trick is to avoid transforming data-points and leveraging dot-products.</p> <p>The technique is as follows:</p> <p>$\Phi$ is a mapping from our existing point-space to a higher-dimensional space $\mathcal{F}$.</p> <p>After a certain amount of linear algebra, the PCA in space $\mathcal{F}$ can be expressed as a PCA on the <a href="http://en.wikipedia.org/wiki/Kernel_method">kernel matrix</a>.</p> <p>So, the algorithm is expressible as follows:</p> <ul> <li> <p>Compute the kernel matrix $K$ where $K_{ij} = \Phi(x_i) \cdot \Phi(x_j)$.</p></li> <li> <p>This matrix is efficiently constructed since we can obtain the constituent dot products in the original space.</p></li> <li> <p>The SVD of $K$ gives you $USV^{T}$ - $U$ and $S$ can be used to construct a reduced dimension dataset.</p></li></ul> <p>Now that the intro is out of the way, I wanted to demonstrate some simple connections between algorithms I&rsquo;ve covered recently:</p> <h3 id="mds-with-euclidean-distances">MDS with Euclidean Distances</h3> <p>In <a href="http://blog.shriphani.com/2015/01/15/multidimensional-scaling-and-pca-are-the-same-thing/">previous blog posts</a>, we covered that MDS and PCA are equivalent. A simple proof exists to show that MDS and Kernel PCA are the same thing:</p> <ul> <li>The proximity matrix in the MDS algorithm is built with distances. We can express distances between vectors $x$ and $y$ as:</li></ul> <p> <div>$$d(x,y) = x \cdot x + y \cdot y &ndash; 2x \cdot y$$</div></p> <ul> <li>Thus, distances can be expressed as a kernel. The upshot of this is that the MDS algorithm itself is an instance of Kernel PCA.</li></ul> <h3 id="isomap">Isomap</h3> <p>The Isomap algorithm (covered in a <a href="http://blog.shriphani.com/2014/11/12/the-isomap-algorithm/">previous post</a>) trades the Euclidean distance with edge weights in a nearest neighbor graph. The entries in this proximity matrix are surrogates for distances and thus the Isomap algorithm is an instance of Kernel PCA as well.</p> <p>It is amazing how several different approaches to dimension-reduction are variants of a single theme.</p></html>