Marco Polo describes a bridge, stone by stone.
“But which is the stone that supports the bridge”, Kublai Khan asks.
“The bridge is not supported by one stone or another”, Marco answers, but by the line of the arch they form.
Kublai Khan remains silent, reflecting, then he adds, “Why do you speak of the stones? It is only the arch that matters to me.”
Polo answers, “Without stones there is no arch.”
For my parents’ 26th anniversary, I decided to convert an online religious text they read into a beautiful, welltypeset book.
The online text was built by volunteers using an archaic version of Microsoft Word and looks like this:
Anyone who has read science or math literature is exposed to the highquality output LaTeX produces.
Fortunately LaTeX’s abilities extend far beyond the domain of mathematical symbols.
I was able to combine Clojure’s excellent HTML processing infrastructure (enlive) and LaTeX to produce a nice looking document.
The entire process took a few hours.
Here are two pages from the final output:
This blog post contains latex and clojure snippets to produce that output. I am not good at designing books or combining typefaces and would appreciate advice.
Given a graph $ G = (V, E) $, its adjacency matrix $ A $ contains an entry at $ A_{ij} $ if vertices $ i $ and $ j $ have an edge between them. The degree matrix $ D $ contains the degree of each vertex along its diagonal.
The graph laplacian of $ G $ is given by $ D  A $.
Several popular techniques leverage the information contained in this matrix. This blog post focuses on the two smallest eigenvalues.
First, we look at the eigenvalue 0 and its eigenvectors. A very elegant result about its multiplicity forms the foundation of spectral clustering.
Then we look at the second smallest eigenvalue and the corresponding eigenvector. A slightly more involved result (YMMV) allows us to partition the graph in question. A recent publication by John Urschel (who apparently moonlights as a sportsperson) focused on this quantity.
The insights provided here sacrifice some rigor for the sake of brevity. I find such descriptions help me study without getting bogged down too much with details. A bibliography provided at the end contains links to actual proofs.
Kimono Labs and I have parted ways; I will be working on problems at a stealth startup starting Monday.
Update: I have been working on a nicer fuller crawler in clojure  Pegasus
Nutch and Heritrix are battletested webcrawlers. ClueWeb9, ClueWeb12 and the CommonCrawl corpora employed one of these.
Toy crawlers that hold important datastructures in memory fail spectacularly when downloading a large number of pages. Heritrix and Nutch benefit from several manyears of work aimed at stability and scalability.
In a previous project, I wanted to leverage Heritrix’s infrastructure and the flexibility to implement some custom components in Clojure. For instance, being able to extract certain links based on the output of a classifier. Or being able to use simple enlive
selectors.
The solution I used was to expose the routines I wanted via a webserver and have Heritrix request these routines.
This allowed me to use libraries like enlive
that I am comfortable with and still avail the benefits of the infra Heritrix provides.
What follows is a library  sleipnir, that allows you to do all this in a simple way.
In the past few blog posts, I covered some details of popular dimensionreduction techniques and showed some common themes. In this post, I will collect all these materials and tie them together.
The Kernel PCA is an extension of the PCA algorithm. In particular, we desire to (i) transform our existing dataset to another highdimensional space and then (ii) perform PCA on the data in that space.
In this blog post, I will perform a very quick, nonrigorous overview of Kernel PCA and demonstrate some connections between other forms of dimensionreduction.
Why would this be useful? Consider this dataset (stolen from the scikitlearn documentation):
As we can see, the original PCA projection is fairly useless. Applying a kernel produces a much better projection.
Like kernels in several problems, the trick is to avoid transforming datapoints and leveraging dotproducts.
The technique is as follows:
$ \Phi $ is a mapping from our existing pointspace to a higherdimensional space $ \mathcal{F} $.
After a certain amount of linear algebra, the PCA in space $ \mathcal{F} $ can be expressed as a PCA on the kernel matrix.
So, the algorithm is expressible as follows:

Compute the kernel matrix $ K $ where $ K_{ij} = \Phi(x_i) \cdot \Phi(x_j) $.

This matrix is efficiently constructed since we can obtain the constituent dot products in the original space.

The SVD of $ K $ gives you $ USV^{T} $  $ U $ and $ S $ can be used to construct a reduced dimension dataset.
Now that the intro is out of the way, I wanted to demonstrate some simple connections between algorithms I’ve covered recently:
MDS with Euclidean Distances
In previous blog posts, we covered that MDS and PCA are equivalent. A simple proof exists to show that MDS and Kernel PCA are the same thing:
 The proximity matrix in the MDS algorithm is built with distances. We can express distances between vectors $ x $ and $ y $ as:
$$ d(x,y) = x \cdot x + y \cdot y – 2x \cdot y $$
 Thus, distances can be expressed as a kernel. The upshot of this is that the MDS algorithm itself is an instance of Kernel PCA.
Isomap
The Isomap algorithm (covered in a previous post) trades the Euclidean distance with edge weights in a nearest neighbor graph. The entries in this proximity matrix are surrogates for distances and thus the Isomap algorithm is an instance of Kernel PCA as well.
It is amazing how several different approaches to dimensionreduction are variants of a single theme.
There is a very simple argument that shows that MDS and PCA achieve the same results.
This argument has 2 important components. The first of these shows that an eigendecomposition of a gram matrix can be used for dimensionreduction.
PCA leverages the singular value decomposition (SVD). Given a matrix $ X $, the SVD is $ X = USV^{T} $.
We drop columns from X by using $ US_{t} $ where we drop some rows and columns from S.
This is also conviently obtained using an eigendecomposition of the covariance matrix.
Working with the gram matrix, we have $ XX^{T} $ and when expressed in terms of $ U $, $ S $ and $ V $, we have $ XX^{T} $ = $ (USV^{T})(VSU^{T}) $.
Simple algebra tells us that this is equal to $ US^{2}U^{T} $. The spectral theorem tells us that this the eigendecomposition of the gram matrix will return this decomposition. $ U $ and $ S $ can be retrieved and a dataset with fewer dimensions can be obtained.
The second part of the argument involves proving that a matrix of distances is indeed a gram matrix. This argument was discussed in a previous post.
Last week, I dropped out of the PhD program at LTI, SCS, Carnegie Mellon University with a masters degree. The specifics of this decision will remain private. If you are going through issues in your PhD program, I will be happy to talk to you and give you advice (should you desire any). You can email me at: shriphanip@gmail.com. You can also call me at 425–623–2604.