For my parents’ 26th anniversary, I decided to convert an online religious text they read into a beautiful, well-typeset book.
The online text was built by volunteers using an archaic version of Microsoft Word and looks like this:
Anyone who has read science or math literature is exposed to the high-quality output LaTeX produces.
Fortunately LaTeX’s abilities extend far beyond the domain of mathematical symbols.
I was able to combine Clojure’s excellent HTML processing infrastructure (enlive) and LaTeX to produce a nice looking document.
The entire process took a few hours.
Here are two pages from the final output:
This blog post contains latex and clojure snippets to produce that output. I am not good at designing books or combining typefaces and would appreciate advice.
The LaTeX Pieces
The inspiration for this book came from this TeX StackExchange thread.
A user was working on replicating a 16th century bible (img from LaTeX Stack Exchange):
Using that piece as inspiration, I converged on the following theme:
- A garamond typeface - I think they fit the theme of religious texts quite well. Fortunately a nice package ebgaramond makes it easy to typeset your entire document in this beautiful font.
1 |
\usepackage{ebgaramond} |
is all you need to put in your LaTeX document.
- Liberal use of ornaments on page-borders, special pages etc.
The pgfornament package comes with very beautiful ornaments. When combined with TikZ, a seasoned user can create very sophisticated and professional documents.
I am not a seasoned user so I was perfect satisfied with using something out-of-the-box. Each page in the book was going to have these ornaments in the page-corners:
The pgfornaments package combined with eso-pic allows you to achieve exactly that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
\makeatletter \AddToShipoutPicture{% \begingroup \setlength{\@tempdima}{2mm}% \setlength{\@tempdimb}{\paperwidth-\@tempdima-2cm}% \setlength{\@tempdimc}{\paperheight-\@tempdima}% \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimc}){% \pgfornament[anchor=north west,width=2cm]{63}} \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdima}){% \pgfornament[anchor=south west,width=2cm,symmetry=h]{63}} \put(\LenToUnit{\@tempdimb},\LenToUnit{\@tempdimc}){% \pgfornament[anchor=north east,width=2cm,symmetry=v]{63}} \put(\LenToUnit{\@tempdimb},\LenToUnit{\@tempdima}){% \pgfornament[anchor=south east,width=2cm,symmetry=c]{63}} \endgroup } \makeatother |
Next, I decided that each chapter would begin at a new-page.
Chapter numbers and subtitles (if any) would be adorned above and below with ornaments. Essentially I was going for:
Note that the ornaments in the corner are the result of eso-pic.
The borders in the north, south, east and west, and the styling around the chapter title are accomplished by:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
\newpage \newgeometry{left=0cm,bottom=0cm,top=0cm,right=0cm} \begin{tikzpicture}[remember picture, overlay] \node[anchor=north] at (current page.north){\pgfornament[width=6cm,symmetry=h]{46}}; \node[anchor=south] at (current page.south){\pgfornament[width=6cm]{46}}; \node[anchor=north,rotate=90] at (current page.west){\pgfornament[width=6cm,symmetry=h]{46}}; \node[anchor=north,rotate=-90] at (current page.east){\pgfornament[width=6cm,symmetry=h]{46}}; \node[inner sep=6pt] (chapter) at (current page.center){\Huge Chapter I}; \node[inner sep=12pt, below of=chapter, text width=10cm, align=center, outer sep=12pt] (title1) { }; \node[inner sep=12pt, below of=title1, text width=10cm, align=center, outer sep=12pt] (title) { Salutations -- The Story of Grinding Wheat and Its Philosophical Significance}; \node[anchor=north] at (title.south){\pgfornament[width=5cm]{60}}; \node[anchor=south] at (chapter.north){\pgfornament[width=5cm,symmetry=h]{49}}; \end{tikzpicture} \newpage \restoregeometry |
This forms the template for the book. Next, we populate the contents.
The Clojure Pieces
Enlive is a fantastic HTML parsing library for clojure. The hierarchical structure of HTML is captured in a clojure map:
To transform a single chapter, we traverse this map (tree) and transform the text as is appropriate. This is governed by where in the document the text occurs.
After manually inspecting a few chapters, I made a small table that mapped root - leaf paths in the DOM to handlers that would transform the text.
In clojure this can be succintly described as so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
(defn convert-chapter-parse ([a-map] (->> (convert-chapter-parse a-map []) flatten (apply str))) ([a-map parent-path] (let [current-node-path (conj parent-path (if (= (:tag a-map) :body) [(:tag a-map)] [(:tag a-map) (:attrs a-map)])) node-contents (:content a-map)] (map (fn [an-item] (if (map? an-item) (convert-chapter-parse an-item current-node-path) (let [fixed-item (-> an-item (StringEscapeUtils/unescapeHtml3) (string/replace #"\&" (Matcher/quoteReplacement "\\&")))] (format-content fixed-item current-node-path)))) node-contents)))) |
Essentially you keep track of where you are in the tree (relative to the root element) and then fetch a function from a table that transforms your text appropriately.
The table itself looks like this:
1 2 3 4 5 6 7 8 9 |
[[:body] [:p nil]] identity [[:body] [:p nil] [:font {:size 5}]] identity [[:body] [:p nil] [:b nil]] (fn [text] (str "\\section*{" text "}")) |
Simple.
Run this on the entire book and I managed to have a neatly typeset book hosted here.
Remarks
While a seemingly simple exercise (under 100 lines of code), html allows you to get the same output with different templating. I noticed that converting 10 chapters at a time and inspecting the batch for quirks was a better approach for measuring coverage.
LaTeX isn’t particularly fond of how HTML, MS Word etc use / handle double quotes, apostrophes and so on. I have a couple of string/replace functions but it clearly wasn’t enough to deal with the entire book. This is a problem that can only be solved by actually reading the book.
Overall, this turned out to be a really appreciated gift.