SHRIPHANI PALAKODETY (page 8)

On Empires

2014-08-16

It is the desperate moment when we discover that this empire which had seemed to us the sum of all wonders, is an endless, formless ruin, that corruption’s gangrene has spread too far to be healed by our scepter, that the triumph over enemy sovereigns has made us the heirs of their long undoing.

— Invisible Cities (Italo Calvino)

Subotai: Data Mining for HTML Documents

2014-06-24

clojure, html, data-mining, near-duplicate-detection, structural-similarity

I spent the last few months studying and implementing some routines that take a raw HTML document (or documents) and do stuff with it (them). Subotai is a library that consolidates some of these routines. In this blog post I will describe what is currently implemented and what the roadmap is.

Disco Dora Maar

2014-06-09

quil, art

Disco Dora Maar, made with Quil and Clojure.

Disco Dora Maar

Disco Rectangles

2014-05-31

art, quil

I was playing with quil recently (got a project planned which I will speak about later) and managed to throw this together in a short while:

Clojure source available here.

Gianni Agnelli

2014-05-28

gianni, agnelli, fiat, ferrari, mens-fashion, dapper

Of Fiat, Ferrari, Maserati, Sestriere fame. Also, the best dressed man ever (IMO):

Visualizing the most powerful brands by industry

2014-05-21

d3, visualizations, visualization, forbes, brand, powerful

Hover on the arcs for details.

Data from Forbes, plotted using d3.js

Wikipedia Server Requests By the Hour

2014-05-18

d3, visualizations, wikipedia, visualization

I found this dataset of server requests to Wikipedia. This is a plot of the server requests made by the hour on 19th of September, 2007.

Code and processed dataset used to generate this plot are in this repo.

Augmenting enlive

2014-05-16

clojure, enlive, htmlcleaner, scraping

In manipulating HTML documents for features, I find myself needing to use some operations all the time - removing script tags, comments and the like. This feature-set is available in HtmlCleaner and I thus merged the two libraries to produce enlive-helper.

Now you can do:

1
2
3

(html-resource-steroids 
 (java.io.StringReader. "<html><body><a>hi</a></body></html>") 
 :prune-tags "a")

And as a result the a tag is not picked up:

({:tag :html,
  :attrs nil,
  :content
  ("\n"
   {:tag :head, :attrs nil, :content nil}
   "\n"
   {:tag :body, :attrs nil, :content nil})})

The options you can pass mirror those of HtmlCleaner. Full docs available in this github repo.

Also, the code is something I threw together from my research so it is released under Matt Might’s CRAPL license.

Diagnosis by Google Doesn’t Work

2014-05-07

information-retrieval, SIGIR, healthcare, symptoms

I have often Googled for symptoms, visited WebMD (and concluded that I have a deadly disease). At SIGIR 2013, Ryen White’s paper, Beliefs and Biases in IR, provided empirical evidence for the poor success-rate of diagnosis-by-google.

The authors mined medical yes/no questions (For example: Can salmonella cause belly-ache), had physicians answer these questions, and then measured user bias post-search (i.e. the users after perusing the results answer their original questions with yes/no) (the paper contains a very detailed description of the experiments conducted).

The accuracy of the final answer was the most interesting part of this paper - only about half of the questions were accurately answered. That is as good as flipping a (fair) coin for each question. The rest of the paper was a fairly interesting read (and it won the SIGIR 2013 best paper award).

Consistent Hashing in Clojure

2014-05-01

clojure, consistent-hashing, hotspots

I wrote this post to teach myself consistent hashing - a simple hash family that Akamai’s founders came up with. This was originally done to prepare for a talk in my grad algorithms class (I made a horlicks of the talk but whatever). I am going to provide intuition, analysis and a clojure implementation.