Posts tagged 'dsl'

A Clojure DSL for Web-Crawling

2016-11-16

clojure, web-crawling, dsl, crawling, scraping

When building crawlers, most of the effort is expended in guiding them through a website. For example, if we want to crawl all pages and individual posts on this blog, we extract links like so:

Visit current webpage
Extract pagination links
Extract link to each blog post
Enqueue extracted links
Continue

In this blog post, I present a new DSL that allows you to concisely describe this process.

This DSL is now part of this crawler: https://github.com/shriphani/pegasus