In 2010, I purchased my first Kindle and since then apart from GEB , I haven’t bothered with physical copies. The Kindle store satisfies most of my needs (I find situations where the paperback costs less than the digital copy and refuse to buy the book on principle).
The books can be read on any platform (OS X, iOS for iPad and iPhone in my case and I do remember a rather unpleasant Kindle app on WP7)
One of the benefits of a digital book is that it should be straightforward for me to collect a list of highlights I’ve made about the book. Amazon (in their infinite wisdom) have not provided an API in the 3 or so years I’ve used the Kindle ecosystem and manually transcribing the quotes is not something I am interested in doing. Scraping remains the only alternative. I decided to use clojure for this task.
For ClueWeb, I discarded the use of Selenium since running a browser impedes the crawler. Selenium is a good fit for this problem which can be summarized as:
(logging into kindle.amazon.com -> downloading a list o of book-specific-s-expressions -> download highlights for desired book/author)
The following routines accomplish that. I dump both to file since my list of books read does not grow by the second so it is feasible to work with a stale file.
s-expressions look like this:
And the highlights:
Both there are (slightly-curated)
s-expressions from my reading lists and Kurt Vonnegut books respectively. You can work on curating the resulting
s-expressions using your own techniques.
I have a command line wrapper around it. Details on the github repo wiki . What would be more interesting is to make the filter a routine a bit better than a linear scan + regular expression match.