A Clojure DSL for Web-Crawling

When building crawlers, most of the effort is expended in guiding them through a website. For example, if we want to crawl all pages and individual posts on this blog, we extract links like so:

  1. Visit current webpage
  2. Extract pagination links
  3. Extract link to each blog post
  4. Enqueue extracted links
  5. Continue

In this blog post, I present a new DSL that allows you to concisely describe this process.

This DSL is now part of this crawler: https://github.com/shriphani/pegasus



Per Intellectum, Vis
(c) Shriphani Palakodety 2013-2016