pdf-stamper0.2.14-SNAPSHOTCombine template descriptions and template PDFs with data to produce PDFs. dependencies
| (this space intentionally left almost blank) | |||||||||||||||
PDF creation from templatespdf-stamper lets you build complete PDF documents without worrying about building the layout in code. Those who have tried will know that it is by no means a simple task getting the layout just right, and building a layout that can adapt to changing requirements can get frustrating in the long run. With pdf-stamper the layout is decoupled from the code extracting and manipulating the data. This leads to a simpler process for building PDF documents from your data: Data placement is controlled by template description datastructures, and data is written to PDF pages defining the layout. | ||||||||||||||||
(ns pdf-stamper (:require [pdf-stamper.context :as context] [pdf-stamper.text :as text] [pdf-stamper.text.parsed :as parsed-text] [pdf-stamper.images :as images] [potemkin]) (:import [org.apache.pdfbox.pdmodel PDDocument] [org.apache.pdfbox.pdmodel.edit PDPageContentStream])) | ||||||||||||||||
Templatestemplate descriptions are regular Clojure maps with the three keys:
The HolesHoles are what make a template description: They define where on the page the various pieces of data are put, and how. | ||||||||||||||||
There are a number of hole types built in to pdf-stamper, but new hole types can be added by implementing this multimethod. If a hole type should be able to overflow, the return value from a call to | (defmulti fill-hole (fn [document c-stream hole location-data context] (:type hole))) | |||||||||||||||
All holes have these fields in common:
Coordinates and widths/heights are always in PDF points (1/72 inch). Note: The PDF coordinate system starts from the bottom left, and increasing y-values move the cursor up. Thus, all
| ||||||||||||||||
When filling the holes on a page we have to take into account that Clojure sequences are
lazy by default; i.e. we cannot expect the side-effects of stamping to the PDF page to have
happened just by applying the Note: Holes where the page does not contain data will be skipped. | (defn- fill-holes [document c-stream holes page-data context] (doall (into {} (map (fn [hole] (when-let [location-data (get-in page-data [:locations (:name hole)])] (fill-hole document c-stream hole location-data context))) (sort-by :priority holes))))) | |||||||||||||||
The types supported out of the box are:
For specifics on the hole types supported out of the box, see the documentation for their respective namespaces. | ||||||||||||||||
(defmethod fill-hole :image [document c-stream hole location-data context] (let [data (merge hole location-data)] (images/fill-image document c-stream data context))) | ||||||||||||||||
(defmethod fill-hole :text-parsed [document c-stream hole location-data context] (let [data (update-in (merge hole location-data) [:contents :text] #(if (string? %) (parsed-text/get-paragraph-nodes %) %))] (text/fill-text-parsed document c-stream data context))) | ||||||||||||||||
(defmethod fill-hole :text [document c-stream hole location-data context] (let [data (merge hole location-data)] (text/fill-text document c-stream data context))) | ||||||||||||||||
The ContextThe context is the datastructure that contains additional data needed by pdf-stamper. For now that is fonts and templates (both descriptions and files).
This namespace contains referrals to the three important user-facing functions from the context namespace, namely | ||||||||||||||||
(potemkin/import-vars [pdf-stamper.context add-template add-font base-context]) | ||||||||||||||||
Filling pagespdf-stamper exists to fill data onto pages while following a pre-defined layout. This is where the magic happens. | ||||||||||||||||
Trying to stamp a page that requests a template not in the context
is an error. This is function is used to give a clear name to the
precondition of | (defn- page-template-exists? [page-data context] (get-in context [:templates (:template page-data)])) | |||||||||||||||
Every single page is passed through this function, which extracts the relevant template and description for the page data, adds it to the document being built, and delegates the actual work to the hole- filling functions defined above. The template to use is extracted from the page data. Using this the available holes, template PDF page, and template to use with overflowing holes (if any) are extracted from the context. Any overflowing holes are handled by calling recursively with the overflow. All other holes are copied as-is to the new page, to make repeating holes possible. Future: It would probably be wise to find a better way than a direct recursive call to handle overflows. Otherwise handling large bodies of text could become a problem. | (defn- fill-page [document page-data context] (assert (page-template-exists? page-data context) (str "No template " (:template page-data) " for page.")) (let [template (:template page-data) template-overflow (context/get-template-overflow template context) template-holes (context/get-template-holes template context) template-doc (context/get-template-document template context) template-page (-> template-doc (.getDocumentCatalog) (.getAllPages) (.get 0)) template-c-stream (PDPageContentStream. document template-page true false)] (.addPage document template-page) (let [overflows (fill-holes document template-c-stream (sort-by :priority template-holes) page-data context) overflow-page-data {:template template-overflow :locations (when (seq overflows) (merge (:locations page-data) overflows))}] (.close template-c-stream) (if (and (seq (:locations overflow-page-data)) (:template overflow-page-data)) (conj (fill-page document overflow-page-data context) template-doc) [template-doc])))) | |||||||||||||||
When the context is populated with fonts and templates, this is the function to call. The data passed in as the first argument is a description of each individual page, i.e. a seq of maps containing the keys:
| (defn fill-pages [pages context] (let [output (java.io.ByteArrayOutputStream.)] (with-open [document (PDDocument.)] (let [context-with-embedded-fonts (reduce (fn [context [font style]] (context/embed-font document font style context)) context (:fonts-to-embed context)) open-documents (doall (map #(fill-page document % context-with-embedded-fonts) pages))] (.save document output) (doseq [doc (flatten open-documents)] (.close doc)))) output)) | |||||||||||||||
This concludes the discussion of the primary interface to pdf-stamper. Following are the namespace documentations for the functionality that is not directly user-facing. | ||||||||||||||||
The state of pdf-stamper is encapsulated in a datastructure called the context. This structure contains fonts and templates, and is partly constructed by users of pdf-stamper by adding to a base context. This base context contains the fonts included in PDFBox standard, and nothing else. The functions in this namespace all exist to modify or query the context datastructure. Functions relevant
to client code is exported in the | ||||||||||||||||
(ns pdf-stamper.context (:require [clojure.edn :as edn] [clojure.string :as string] [clojure.java.io :as io] [pdf-stamper.schemas :as schemas]) (:import [org.apache.pdfbox.pdmodel PDDocument] [org.apache.pdfbox.pdmodel.font PDFont PDType1Font PDTrueTypeFont])) | ||||||||||||||||
Templates | ||||||||||||||||
There are no standard templates in pdf-stamper. | (def base-templates {}) | |||||||||||||||
Adding templates to the context is achieved using this function. When adding a template two things are needed: The template description, i.e. what goes where, and a locator for the PDF page to use with the template description. The template locator can be either a URL or a string. | (defn add-template ^{:pre [(some? template-uri)]} [template-def template-uri context] (when-let [schema-check (schemas/validation-errors template-def)] (throw (ex-info (str schema-check " | IN: " template-def) schema-check))) (-> context (assoc-in [:templates (:name template-def)] template-def) (assoc-in [:templates (:name template-def) :uri] template-uri))) | |||||||||||||||
The template file is loaded lazily, i.e. it is not until a page actually requests to be written using the added template that it is read to memory. | (defn get-template-document [template context] (let [file-uri (get-in context [:templates template :uri])] (assert file-uri (str "file-uri is nil for template " template)) (PDDocument/load file-uri))) | |||||||||||||||
Any template consists of a number of holes specifying the size and shape of data when stamped onto the template PDF page. | (defn get-template-holes [template context] (get-in context [:templates template :holes])) | |||||||||||||||
Templates can specify an overflow template, a template that will be used for any data that did not fit in the holes on the original template's page. | (defn get-template-overflow [template context] (get-in context [:templates template :overflow])) | |||||||||||||||
FontsFonts in PDF follow the typographical conventions. Important font concepts for this project are:
These are illustrated below: When writing text the cursor origin is placed on the baseline. | ||||||||||||||||
pdf-templates uses PDFBox under the hood, and because of that includes all the standard PDF fonts defined by PDFBox. | (def base-fonts {:times {#{:regular} PDType1Font/TIMES_ROMAN #{:bold} PDType1Font/TIMES_BOLD #{:italic} PDType1Font/TIMES_ITALIC #{:bold :italic} PDType1Font/TIMES_BOLD_ITALIC} :helvetica {#{:regular} PDType1Font/HELVETICA #{:bold} PDType1Font/HELVETICA_BOLD #{:italic} PDType1Font/HELVETICA_OBLIQUE #{:bold :italic} PDType1Font/HELVETICA_BOLD_OBLIQUE} :courier {#{:regular} PDType1Font/COURIER #{:bold} PDType1Font/COURIER_BOLD #{:italic} PDType1Font/COURIER_OBLIQUE #{:bold :italic} PDType1Font/COURIER_BOLD_OBLIQUE} :symbol {#{:regular} PDType1Font/SYMBOL} :zapf-dingbats {#{:regular} PDType1Font/ZAPF_DINGBATS}}) | |||||||||||||||
If any templates have need of fonts that are not part of the standard
PDF font library, they can be added by providing a font descriptor,
the font name and the font style. As an example, had the Times New Roman
bold font not been present already, here is how one would add it:
Notice how the style is a set of keywords. This is to support the combined font styles like bold AND italic, without requiring an arbitrary ordering on the individual parts of the style. In the example above the font descriptor was provided as a string representing
a file name, but it could just as well have been a In PDF, non-standard fonts should be embedded in the document that uses them. Adding a font like above does not automatically embed it to a document, since the context does not have knowledge of documents. Instead, the context is updated with a seq of [font style] pairs that need to be embedded when a new document is created. Note: Only TTF fonts are supported. | (defn add-font [desc font style context] (-> context (assoc-in [:fonts (keyword font) style :desc] desc) (update-in [:fonts-to-embed] #((fnil conj []) % [(keyword font) style])))) | |||||||||||||||
On creation of a new document all fonts in the seq of fonts to embed should be embedded. If for some reason a font is found in the seq of fonts to embed but does not contain a descriptor, nothing happens and the context is returned unmodified. In practice this situation is highly unlikely, and the check is primarily in place to prevent unanticipated crashes (in case code external to pdf-stamper modified the context). The font descriptor is coerced to an input stream and loaded into the document, after which it is automatically closed. | (defn embed-font [doc font style context] (if-let [font-desc (get-in context [:fonts font style :desc])] (assoc-in context [:fonts font style] (with-open [font (io/input-stream font-desc)] (PDTrueTypeFont/loadTTF doc font))) context)) | |||||||||||||||
When a font has been added to the context and embedded in a document, it can be queried by providing the font name and style. It is guaranteed that a font is always found. Thus, if no font with the given name is registered the default font (Times New Roman) is used with the supplied style. If again no font is found, the default font and style are used (Times New Roman Regular). | (defn get-font [font-name style context] {:post [(instance? PDFont %)]} (get-in context [:fonts font-name style] (get-in context [:fonts :times style] (get-in context [:fonts :times #{:regular}])))) | |||||||||||||||
Font utilitiesThe following utility functions rely on PDFBox' built-in font inspection methods. In PDFBox the font widths and heights are returned in a size that is multiplied by 1000 (presumably because of rounding, but I may be wrong), which explains the, otherwise seemingly arbitrary, divisions by 1000. | ||||||||||||||||
Computing line lengths of unknown strings requires knowledge of the average width of a font, given style and size. | (defn get-average-font-width [font-name style size context] (let [font (get-font font-name style context)] (* (/ (.. font (getAverageFontWidth)) 1000) size))) | |||||||||||||||
With complete knowledge of the string it is possible to get the exact width of the string. | (defn get-font-string-width [font-name style size string context] (let [font (get-font font-name style context)] (* (/ (.. font (getStringWidth string)) 1000) size))) | |||||||||||||||
(defn get-font-descent [font-name style size context] (let [font (get-font font-name style context) font-descriptor (.. font (getFontDescriptor)) descent (.. font-descriptor (getDescent))] (* (/ (Math/abs descent) 1000) size))) | ||||||||||||||||
(defn get-font-ascent [font-name style size context] (let [font (get-font font-name style context) font-descriptor (.. font (getFontDescriptor)) ascent (.. font-descriptor (getAscent))] (* (/ ascent 1000) size))) | ||||||||||||||||
By adding the absolute value of the font's descent to the font's ascent, we get the actual height of the font. We have to use the absolute value of the descent since it might be a negative value (it probably is, at least for FreeType fonts). | (defn get-font-height [font-name style size context] (let [font (get-font font-name style context) font-descriptor (.. font (getFontDescriptor)) ascent (.. font-descriptor (getAscent)) descent (.. font-descriptor (getDescent))] (* (/ (+ ascent (Math/abs descent)) 1000) size))) | |||||||||||||||
The leading is the extra spacing from baseline to baseline, used for multi-line text. | (defn get-font-leading [font-name style size context] (let [font (get-font font-name style context) font-descriptor (.. font (getFontDescriptor)) leading (.. font-descriptor (getLeading))] (* (/ leading 1000) size))) | |||||||||||||||
Base context | ||||||||||||||||
The base context is a combination of the base fonts with the base templates, and simply provides a good starting point for adding custom fonts and own templates. | (def base-context {:templates base-templates :fonts base-fonts}) | |||||||||||||||
Image holesHoles where | ||||||||||||||||
(ns pdf-stamper.images (:import [org.apache.pdfbox.pdmodel PDDocument] [org.apache.pdfbox.pdmodel.edit PDPageContentStream] [org.apache.pdfbox.pdmodel.graphics.xobject PDJpeg PDXObjectImage PDPixelMap])) | ||||||||||||||||
To calculate the new dimensions for scaled images, the image is first scaled such that the height fits into the available bounds. If the image width is still larger than it should be, it means that the scaling factor for the width is larger than for the height, and we use that to compute a new height. The arguments are passed as a map to provide some context to the four numbers, as it is otherwise too easy to mix up the parameters when applying this function. Future: Going by the description above it should be possible to refactor this to compute both scaling factors up front, and simply use the largest. | (defn- scale-dimensions [{:keys [b-width b-height i-width i-height]}] (let [height-factor (/ b-height i-height) new-width (* i-width height-factor)] (if (> new-width b-width) (let [width-factor (/ b-width new-width) new-height (* b-height width-factor)] [b-width new-height]) [new-width b-height]))) | |||||||||||||||
Stamping an image onto the PDF's content stream while still
preserving aspect ratio potentially requires moving the image's
origin. | (defn- draw-image-preserve-aspect [c-stream image data] (let [{:keys [x y width height]} data awt-image (.. image (getRGBImage)) img-height (.. awt-image (getHeight)) img-width (.. awt-image (getWidth)) [scaled-width scaled-height] (scale-dimensions {:b-width width :b-height height :i-width img-width :i-height img-height}) new-x (+ x (Math/abs (/ (- width scaled-width) 2))) new-y (+ y (Math/abs (/ (- height scaled-height) 2)))] (.. c-stream (drawXObject image new-x new-y scaled-width scaled-height)))) | |||||||||||||||
Stamping an image onto the PDF's content stream without
preserving aspect ratio is much simpler: PDFBox resizes the
image to fill the entire box specified by | (defn- draw-image [c-stream image data] (let [{:keys [x y width height]} data] (.. c-stream (drawXObject image x y width height)))) | |||||||||||||||
When stamping an image, the image is always shrunk to fit the
dimensions of the hole. The value of the It is possible to specify the quality of the stamped image by
setting the Note: Using | (defn fill-image [document c-stream data context] (let [aspect-ratio (get data :aspect :preserve) image-quality (get data :quality 0.75) image (PDJpeg. document (get-in data [:contents :image]) image-quality)] (assert image "Image must be present in hole contents.") (condp = aspect-ratio :preserve (draw-image-preserve-aspect c-stream image data) :fit (draw-image c-stream image data)))) | |||||||||||||||
User input to pdf-stamper is validated using the schema library from Prismatic. | ||||||||||||||||
(ns pdf-stamper.schemas (:require [schema.core :as s])) | ||||||||||||||||
(def BaseHole {:height s/Num :width s/Num :x s/Num :y s/Num :name s/Keyword :priority s/Int}) | ||||||||||||||||
(def ImageHole (merge BaseHole {:type (s/enum :image) (s/optional-key :quality) s/Num (s/optional-key :aspect) (s/enum :preserve :fit)})) | ||||||||||||||||
(def ParagraphFormat {:font s/Keyword :style #{s/Keyword} :size s/Int :color [(s/one s/Int "R") (s/one s/Int "G") (s/one s/Int "B")] :spacing {:paragraph {:above s/Num :below s/Num} :line {:above s/Num :below s/Num}} :indent {:all s/Num}}) | ||||||||||||||||
(def BulletParagraphFormat (merge ParagraphFormat {(s/optional-key :bullet-char) s/Str})) | ||||||||||||||||
(def TextHole (merge BaseHole {:format ParagraphFormat :type (s/enum :text) :align {:horizontal (s/enum :center :left :right) :vertical (s/enum :center :top :bottom)}})) | ||||||||||||||||
(def TextParsedHole (merge BaseHole {:type (s/enum :text-parsed) :format {:paragraph ParagraphFormat :head-1 ParagraphFormat :head-2 ParagraphFormat :head-3 ParagraphFormat :bullet BulletParagraphFormat :number BulletParagraphFormat}})) | ||||||||||||||||
(def Hole (s/conditional #(= :image (:type %)) ImageHole #(= :text (:type %)) TextHole #(= :text-parsed (:type %)) TextParsedHole 'has-valid-type-key)) | ||||||||||||||||
(def hole-checker (s/checker Hole)) | ||||||||||||||||
Return v if v is a valid hole, false otherwise. If error-fn is supplied, calls that function with the error message. The return value of error-fn is discarded. | (defn valid-hole? ([v] (not (hole-checker v))) ([v error-fn] {:pre [(fn? error-fn)]} (if-let [err (hole-checker v)] (do (error-fn (merge v (if (map? err) err {:error err}))) false) true))) | |||||||||||||||
(def Template {:name s/Keyword (s/optional-key :overflow) s/Keyword :holes [Hole]}) | ||||||||||||||||
(defn validation-errors [template] (s/check Template template)) | ||||||||||||||||
In some situations, templates in pdf-stamper can become difficult to maintain. One such situation can occur when you have a number of template parts that combine with each other to form the final templates. If the template parts form semantic "layers", and each part of a layer needs to be combined with all parts of the following layer, we get an exponential explosion in the number of templates. Since there is a direction in the way parts are combined the final templates can be described by a number of trees, where each path from a leaf to the root describes one template. To avoid having to write an exponential number of template descriptions by hand, this namespace provides utilities that allow you to specify how the semantic layers relate to each other. | ||||||||||||||||
(ns pdf-stamper.template-utils (:require [pdf-stamper.schemas :as schemas] [clojure.zip :as zip])) | ||||||||||||||||
The zipperSince we are looking at trees, we use First, we define what a zipper of template parts is. | ||||||||||||||||
Create a zipper of parts. This is basically a tree where nodes carry values. Every node is potentially a branch, i.e. leaf nodes are just nodes without children. | (defn- parts-zip [root] (zip/zipper (constantly true) ::children (fn [node children] (assoc node ::children children)) root)) | |||||||||||||||
SubtreesSince we cannot rely on a regular depth-first traversal of the trees when inserting new parts, we define ways to travel around subtrees. We always add to leaves, and the final templates are constructed from the leaf paths, so an easy way to access leaves is needed. | ||||||||||||||||
Go to the left-most leaf in a given subtree. | (defn- to-leaf [tree] (loop [loc tree] (if (zip/down loc) (recur (zip/down loc)) loc))) | |||||||||||||||
Since leaves can be spread over several subtrees, and | ||||||||||||||||
Returns the loc of the next sutree to insert parts in, or nil if there is none. | (defn- next-subtree [loc] (loop [parent loc] (if (zip/right parent) (zip/right parent) (when (zip/prev parent) (recur (zip/up parent)))))) | |||||||||||||||
Add the parts to all leaves of tree. | (defn- add-to-all-leaves [tree part & parts] ((comp parts-zip zip/root) (loop [leaf (to-leaf tree)] (let [new-node (reduce (fn [node p] (zip/append-child node p)) (zip/append-child leaf part) parts)] (if (zip/right new-node) (recur (zip/right new-node)) (if-let [next-subtree (next-subtree new-node)] (recur (to-leaf next-subtree)) new-node)))))) | |||||||||||||||
PartsParts are separated into variadic and non-variadic parts. Non-variadic parts are just regular maps and will be merged into the final template as-is. | ||||||||||||||||
(defn- add-non-variadic-part [trees part] (if (seq trees) (map (fn [tree] (add-to-all-leaves tree {::value part})) trees) (conj trees (parts-zip {::value part})))) | ||||||||||||||||
Variadic parts are parts that in the final template can be one of several values. The structure of a variadic part is as follows: ```clojure {:pdf-stamper/name "part" :pdf-stamper/optional? truthy :pdf-stamper/variants [{::variant-name "flower" ::variant-part {...}} {::variant-name "roots" ::variant-part {...}]} ``` Before a variadic part is inserted into the trees, some metadata is added to it. This metadata allows the construction of the final template name from the leaf-root path. | ||||||||||||||||
(defn- variadic-part? [part] (contains? part ::name)) | ||||||||||||||||
Construct the final variadic parts, adding metadata fields for template construction. | (defn- variadic-parts [part-name variants optional?] (let [parts (map (fn [variant] {::value (with-meta (::variant-part variant) {::name part-name ::part-name (::variant-name variant)})}) variants)] (if optional? (conj parts {::value (with-meta {} {::name part-name ::part-name ""})}) parts))) | |||||||||||||||
Make a variadic part a child to all leaves in all trees. Creates a new tree if there are none. | (defn- add-variadic-part [trees part] (if (seq trees) (map (fn [tree] (apply add-to-all-leaves tree (variadic-parts (::name part) (::variants part) (::optional? part)))) trees) (apply conj trees (map parts-zip (variadic-parts (::name part) (::variants part) (::optional? part)))))) | |||||||||||||||
(comment (add-variadic-part [] {::name "foo" ::optional? false ::variants [{::variant-name "a" ::variant-part {:a 1}} {::variant-name "b" ::variant-part {:b 1}}]}) (add-variadic-part (add-non-variadic-part [] {:nv 1}) {::name "foo" ::optional? false ::variants [{::variant-name "a" ::variant-part {:a 1}} {::variant-name "b" ::variant-part {:b 1}}]}) (add-variadic-part (add-non-variadic-part [] {:nv 1}) {::name "foo" ::optional? true ::variants [{::variant-name "a" ::variant-part {:a 1}} {::variant-name "b" ::variant-part {:b 1}}]}) (add-variadic-part (add-variadic-part (add-non-variadic-part [] {:nv 1}) {::name "foo" ::optional? true ::variants [{::variant-name "a" ::variant-part {:a 1}} {::variant-name "b" ::variant-part {:b 1}}]}) {::name "bar" ::optional? false ::variants [{::variant-name "d" ::variant-part {:d 1}} {::variant-name "e" ::variant-part {:e 1}}]})) | ||||||||||||||||
(defn- add-part [trees part] (if (variadic-part? part) (add-variadic-part trees part) (add-non-variadic-part trees part))) | ||||||||||||||||
PathsThe paths from leaf to root describe the final templates by merging the value at each node. Values closer to the leaves overwrite values closer to the root in case of conflicts. | ||||||||||||||||
Construct a vector of root-leaf paths for tree. The paths contain only the node values. | (defn tree-paths [tree] (let [root->leafs (loop [paths [] leaf (to-leaf tree)] (let [leaf-path (mapv ::value (zip/path leaf)) leaf-value (::value (zip/node leaf)) full-path (conj leaf-path leaf-value)] (if (zip/right leaf) (recur (conj paths full-path) (zip/right leaf)) (if-let [next-subtree (next-subtree leaf)] (recur (conj paths full-path) (to-leaf next-subtree)) (conj paths full-path)))))] root->leafs)) | |||||||||||||||
Construct a seq of all root-leaf paths from all trees. | (defn- all-paths [trees] (mapcat tree-paths trees)) | |||||||||||||||
Building templatesThe templates have a naming scheme with holes, which let variadic parts update influence the final template name. | ||||||||||||||||
(defn- replace-holes [name-with-holes hole-name part-name] (if (and hole-name part-name) (clojure.string/replace name-with-holes (re-pattern (str "\\$" hole-name "\\$")) part-name) name-with-holes)) | ||||||||||||||||
Merges a seq of hole bases into a single seq of holes. A [hole base] is a vector of maps. The result is a single vector of maps. | (defn- merge-hole-bases [hole-bases & {:keys [?merge-fn ?validation-error-fn]}] (let [valid-hole? (if ?validation-error-fn #(schemas/valid-hole? % ?validation-error-fn) schemas/valid-hole?)] (into [] (filter valid-hole? (map (partial apply (or ?merge-fn merge)) (vals (group-by :name (flatten hole-bases)))))))) | |||||||||||||||
Build a template from a naming scheme and a leaf-root path. Takes an optional merge function used when merging two templates. | (defn path-to-template [naming-scheme path & {:keys [?merge-fn ?validation-error-fn]}] (let [unmerged-holes (reduce (fn [template template-part] (let [metadata (meta template-part) part-name (::part-name metadata) scheme-value (::name metadata)] (-> template (update-in [:holes] conj template-part) (update-in [:name] replace-holes scheme-value part-name)))) {:holes [] :name naming-scheme} path) validation-error-fn (when ?validation-error-fn (partial ?validation-error-fn (:name unmerged-holes)))] (-> unmerged-holes (update-in [:holes] #(merge-hole-bases % :?merge-fn ?merge-fn :?validation-error-fn validation-error-fn)) (update-in [:name] keyword)))) | |||||||||||||||
(defn parts->trees [parts] (reduce (fn [trees part] (add-part trees part)) [] parts)) | ||||||||||||||||
Naming scheme is a keyword with "holes" defined by $hole-name$. Example naming scheme: :rhubarb$part$ Values inbetween $'s are matched to the :name of individual parts and replaced as needed. Example with the above naming scheme: parts = [ would yield templates with the names: [:rhubarbflower :rhubarbroots] And the appropriate template parts merged together in the order they are
specified in the parts vector. Returns a vector of ?merge-fn is a function that can merge two maps. The default is to use ?validation-error-fn is a function that will be called once for every hole that is discarded during template construction (due to validation errors). It receives a template name and a validation error. | (defn make-templates [naming-scheme parts & {:keys [?merge-fn ?validation-error-fn] :as opts}] (let [naming-scheme-replacement-map (into {} (map (comp vec reverse) (re-seq #"\$([^\$]+)\$" naming-scheme)))] (map (fn [path] (path-to-template naming-scheme path :?merge-fn ?merge-fn :?validation-error-fn ?validation-error-fn)) (mapcat tree-paths (parts->trees parts))))) | |||||||||||||||
Text holesText holes contain extra formatting information compared to e.g. image holes. The amount of extra formatting information required is a matter of the type of text hole. The requirements are described below. A text box can be visualised as such: There is no text:
| ||||||||||||||||
(ns pdf-stamper.text (:require [pdf-stamper.text.parsed :as parsed-text] [pdf-stamper.text.pdfbox :as pdf])) | ||||||||||||||||
Parsed text holes Holes where
| (defn fill-text-parsed [document c-stream data context] (let [formatting (merge (:format data) (select-keys data [:width :height])) [paragraphs overflow] (parsed-text/paragraphs-overflowing (get-in data [:contents :text]) formatting context)] (-> c-stream (pdf/begin-text-block) (pdf/set-text-position (:x data) (+ (:y data) (:height formatting))) (pdf/write-paragraphs formatting paragraphs context) (pdf/end-text-block)) (parsed-text/handle-overflow overflow (:name data)))) | |||||||||||||||
Text holes Holes where type is To keep the promise that there is not text outside the specified box, pdf-stamper automatically resizes lines that are too long. | (defn fill-text [document c-stream data context] (let [formatting (merge (:format data) (select-keys data [:align :width :height])) text {:contents (get-in data [:contents :text]) :format formatting}] (-> c-stream (pdf/begin-text-block) (pdf/set-text-position (:x data) (:y data)) (pdf/write-unparsed-line text context) (pdf/end-text-block)) nil)) | |||||||||||||||
Documentation intended for developers of pdf-stamper Filling parsed text holes involves a lot of extra processing of the page contents. The input data is required to be a top-level XML-tag containing the actual paragraphs as children. Each paragraph can contain any number of arbitrarily nested character-level tags that must be parsed and represented correctly. | ||||||||||||||||
(ns pdf-stamper.text.parsed (:require [pdf-stamper.context :as context] [clojure.string :as string] [clojure.zip :as zip] [clojure.data.xml :as xml])) | ||||||||||||||||
Internal representationWhile input to the parsed text holes is XML, internally it is converted to a different representation that better supports splitting the contents of tags over multiple lines. The general structure of the internal representation of an XML string is as a seq of the representations of paragraph-level tags.
Paragraph-level tags are represented as maps with the keys | ||||||||||||||||
Multiple types can be converted into the internal representation. | (defprotocol PInternalRep (represent [this])) | |||||||||||||||
Character-level tags | ||||||||||||||||
Converting character-level tags to the internal representation involves handling nested character-level tags. Remember that the contents of a tag is always a
seq. This function handles exactly on piece of the
content seq, such that In the case of nested tags the new style is If | (defn- inline-tag-one [contents style] (if-let [content-style (:style contents)] {:style (conj (if (= content-style [:regular]) [] content-style) style) :contents (:contents contents)} {:style [style] :contents contents})) | |||||||||||||||
To handle the general case where | (defn- inline-tag-multi [contents-s parent-style] (reduce (fn [acc contents] (conj acc (inline-tag-one contents parent-style))) [] contents-s)) | |||||||||||||||
When building the internal representation of a character-level
tag, it is necessary to take into account the entire content
vector. Regardless of whether there are one or many character-
level tags in the content vector, the resulting internal
representation is always a vector of the same size as the
| (defn- inline-tag [style] (fn [content-vec] (let [[f-contents & r-contents] (flatten content-vec) contents (inline-tag-one f-contents style)] (if (seq r-contents) (into [contents] (inline-tag-multi r-contents style)) [contents])))) | |||||||||||||||
Only two types of character-level tags are supported:
However, these can be used for multiple actual XML character-level tags. | ||||||||||||||||
(def em-tag ^:private (inline-tag :em)) (def strong-tag ^:private (inline-tag :strong)) | ||||||||||||||||
Paragraph-level tags | ||||||||||||||||
Converting a paragraph-level tag to the internal representation
is a matter of attaching the element type and flattening the
| (defn- paragraph-tag [elem-type] (fn [content-vec] {:elem elem-type :content (flatten content-vec)})) | |||||||||||||||
Six types of paragraph-level tags are supported:
As with the character-level tags they could in principle be used for multiple actual XML paragraph-level tags each; in practice that is not the case. | ||||||||||||||||
(def ul-li-tag ^:private (paragraph-tag :bullet)) (def ol-li-tag ^:private (paragraph-tag :number)) (def p-tag ^:private (paragraph-tag :paragraph)) (def h1-tag ^:private (paragraph-tag :head-1)) (def h2-tag ^:private (paragraph-tag :head-2)) (def h3-tag ^:private (paragraph-tag :head-3)) | ||||||||||||||||
Putting it all togetherTexts are parsed using clojure.data.xml, so the main part of converting to the internal representation is being able to convert the XML datastructure.
Every Supported XML tags are:
The semantics of each tag follow HTML, e.g. Future: Ordered lists, i.e. | ||||||||||||||||
(extend-protocol PInternalRep java.lang.String (represent [s] {:style [:regular] :contents (string/trim s)}) clojure.data.xml.Element (represent [e] (condp = (:tag e) :b (strong-tag (represent (:content e))) :strong (strong-tag (represent (:content e))) :i (em-tag (represent (:content e))) :em (em-tag (represent (:content e))) :ul (map ul-li-tag (represent (:content e))) :ol (map ol-li-tag (represent (:content e))) :p (p-tag (represent (:content e))) :h1 (h1-tag (represent (:content e))) :h2 (h2-tag (represent (:content e))) :h3 (h3-tag (represent (:content e))) (represent (:content e)))) clojure.lang.LazySeq (represent [s] (map represent s))) | ||||||||||||||||
Because the internal representations can be arbitrarily nested it is necessary to flatten the seq before proceeding. | (def flat-represent ^:private (comp flatten represent)) | |||||||||||||||
(def ^:private paragraph-tags #{:h1 :h2 :h3 :p :ul :ol}) | ||||||||||||||||
Getting the internal representation of the supported paragraph tags in an XML string is simple: Parse it using clojure.data.xml, filter on the supported tag names, and flatten the internal representation. | (defn get-paragraph-nodes [xml-string] (flat-represent (filter #(some #{(:tag %)} paragraph-tags) (zip/children (zip/xml-zip (xml/parse-str xml-string :supporting-external-entities true)))))) | |||||||||||||||
FormattingThe conversion of styles to actual formatting instructions used by the code that does the stamping of text to the PDF. Every internal representation of a character-level tag needs to know which font style it corresponds to. | (def ^:private strong-format {:style #{:bold}}) (def ^:private em-format {:style #{:italic}}) | |||||||||||||||
The font style of a formatting map is updated
according to the styles present in the Example: If | (defn- style->format [regular-format style-vec] (let [formatting (reduce (fn [format style] (merge-with #((fnil into #{}) %1 %2) format (condp = style :regular regular-format :strong strong-format :em em-format {}))) {:style #{}} style-vec)] (merge regular-format formatting))) | |||||||||||||||
Adding font style information to a line means adding font style information to each part of that line. It is expected that a line is split into parts that each have the same styling. | (defn- line-style->format [line formatting] (map (fn [line-part] (assoc line-part :format (style->format formatting (:style line-part)))) line)) | |||||||||||||||
Line breaking & unbreakingNote the use of the word line-part in the previous section: All words in a line-path have the same styling. This is not necessarily true for full lines. | ||||||||||||||||
Distribute the styles of a line on it's individual words, and return a seq of words. This allows lines to be broken and reassembled as required by the line- breaking algorithm. Splitting lines | (defn- line->words [line] (let [style (:style line)] (when-let [line-contents (:contents line)] (map (fn [word] {:style style :contents word}) (string/split (:contents line) #" "))))) | |||||||||||||||
(defn- paragraph->words [paragraph] (mapcat line->words (:content paragraph))) | ||||||||||||||||
Reassemble a line-part from a seq of words by making the style of the first word the style for the entire line-part Assumption: All words in Collecting (un-splitting) lines | (defn- words->line [words] (let [[w & ws] words style (:style w)] {:style style :contents (->> words (map :contents) (string/join " "))})) | |||||||||||||||
Reassemble an entire line of words into line-parts by merging groups of words with the same style (in order of appearance in the line). | (defn- collect-line [line] (map words->line (partition-by :style line))) | |||||||||||||||
(defn- collect-paragraph [paragraph] (map collect-line paragraph)) | ||||||||||||||||
Line-breaking algorithm | ||||||||||||||||
Break a paragraph into lines of | (defn- break-paragraph [paragraph formatting max-width context] (let [{:keys [font style size]} formatting space-width (context/get-font-string-width font style size " " context) [_ last-line lines] (reduce (fn [acc w] (let [{:keys [contents]} w [current-length current-line lines] acc word-length (+ (context/get-font-string-width font style size contents context) space-width) new-length (+ current-length word-length)] (if (<= new-length max-width) [new-length (conj current-line w) lines] [word-length [w] (if (seq current-line) (conj lines current-line) lines)]))) [0 [] []] (paragraph->words paragraph))] (collect-paragraph (conj lines last-line)))) | |||||||||||||||
Reassemble a paragraph from a seq of lines by setting a particular element type for the paragraph (i.e. you have to know in advance which kind of paragraph you are reassembling). | (defn- unbreak-paragraph [formatting lines] (merge (select-keys formatting [:broken :elem]) {:content (mapcat collect-line (partition-by :style (apply concat lines)))})) | |||||||||||||||
Text overflow | ||||||||||||||||
Future: This is one of the places that has to be extended to support indenting only the first line of a paragraph. | (defn- line-length [formatting context] (let [{:keys [font style width bullet-char elem size]} formatting font-width (context/get-average-font-width font style size context) indent-width (get-in formatting [:indent :all]) bullet (str (or bullet-char (char 149))) bullet-length (context/get-font-string-width font style size bullet context) ] (- width indent-width (if (= elem :bullet) (* bullet-length 3) 0)))) | |||||||||||||||
(defn- line-height [formatting context] (let [{:keys [font style size]} formatting font-height (context/get-font-height font style size context) font-leading (context/get-font-leading font style size context)] (+ font-height font-leading (get-in formatting [:spacing :line :below]) (get-in formatting [:spacing :line :above])))) | ||||||||||||||||
When stamping paragraphs of text to a hole it is not certain that enough space is available. To figure out which paragraphs did not fit in the hole it is necessary to know:
| (defn paragraphs-overflowing [paragraphs formatting context] (rest (reduce (fn [[size-left paragraphs overflow] paragraph] (let [actual-formatting (merge (select-keys formatting [:width]) (select-keys paragraph [:elem :broken]) (get formatting (:elem paragraph))) line-chars (line-length actual-formatting context) paragraph-lines (break-paragraph paragraph actual-formatting line-chars context) paragraph-line-height (line-height actual-formatting context) ;; 1 number-of-lines (Math/floor (/ size-left paragraph-line-height)) ;; 2 [p o] (split-at number-of-lines paragraph-lines) ;; 3 paragraph (map #(line-style->format % actual-formatting) p)] (if (seq o) [0 (if (seq paragraph) (conj paragraphs [actual-formatting paragraph]) paragraphs) (conj overflow [(assoc actual-formatting :broken (and (not-empty p) (not-empty o))) o])] ;; 4 [(- size-left (* paragraph-line-height (count paragraph)) (get-in actual-formatting [:spacing :paragraph :above]) (get-in actual-formatting [:spacing :paragraph :below])) (conj paragraphs [actual-formatting paragraph]) overflow]))) [(:height formatting) [] []] paragraphs))) | |||||||||||||||
Any overflow is reassembled to a map that tells the stamping algorithm where to write on an eventual new page. | (defn handle-overflow [overflow hole] (when (seq overflow) {hole {:contents {:text (map (fn [[formatting paragraph]] (unbreak-paragraph formatting paragraph)) overflow)}}})) | |||||||||||||||
Documentation intended for developers of pdf-stamper The functions in this namespace are wrappers around PDFBox functionality, and as such every single function is impure (mutating a content stream). | ||||||||||||||||
(ns pdf-stamper.text.pdfbox (:require [pdf-stamper.context :as context])) | ||||||||||||||||
(defn- move-text-position-up [c-stream amount] (.. c-stream (moveTextPositionByAmount 0 amount)) c-stream) | ||||||||||||||||
(defn- move-text-position-down [c-stream amount] (.. c-stream (moveTextPositionByAmount 0 (- amount))) c-stream) | ||||||||||||||||
(defn- move-text-position-right [c-stream amount] (.. c-stream (moveTextPositionByAmount amount 0)) c-stream) | ||||||||||||||||
(defn- move-text-position-left [c-stream amount] (.. c-stream (moveTextPositionByAmount (- amount) 0)) c-stream) | ||||||||||||||||
(defn- new-line-by-font [c-stream font size style context] (let [font-height (context/get-font-height font style size context) font-leading (context/get-font-leading font style size context)] (move-text-position-down c-stream (+ font-height font-leading)))) | ||||||||||||||||
(defn- set-font [c-stream font size style context] (let [font-obj (context/get-font font style context)] (.. c-stream (setFont font-obj size)) c-stream)) | ||||||||||||||||
(defn- set-color [c-stream color] (let [[r g b] color] (doto c-stream (.setStrokingColor r g b) (.setNonStrokingColor r g b)))) | ||||||||||||||||
(defn- draw-string [c-stream string] (.. c-stream (drawString string)) c-stream) | ||||||||||||||||
(defn- add-padding-horizontal [c-stream line-length formatting] (let [h-align (get-in formatting [:align :horizontal])] (condp = h-align :center (move-text-position-right c-stream (/ (- (:width formatting) line-length) 2)) :left c-stream :right (move-text-position-right c-stream (- (:width formatting) line-length))))) | ||||||||||||||||
(defn- add-padding-vertical [c-stream line-height formatting context] (let [v-align (get-in formatting [:align :vertical]) {:keys [font style size] :as font} formatting] (condp = v-align :center (move-text-position-up c-stream (+ (/ (- (:height formatting) line-height) 2) (context/get-font-descent font style size context))) :top (move-text-position-up c-stream (- (:height formatting) (context/get-font-ascent font style size context))) :bottom (move-text-position-up c-stream (context/get-font-descent font style size context))))) | ||||||||||||||||
(defn- write-linepart [c-stream linepart context] (let [{:keys [font size style color]} (:format linepart)] (-> c-stream (set-font font size style context) (set-color color) (draw-string (:contents linepart))))) | ||||||||||||||||
(defn- write-line [c-stream line context] (doseq [linepart (map #(update-in % [:contents] (fn [s] (str " " s))) line)] (write-linepart c-stream linepart context)) c-stream) | ||||||||||||||||
(defn- write-default-paragraph [c-stream formatting paragraph context] (let [{:keys [font size style]} formatting] (doseq [line paragraph] (-> c-stream (new-line-by-font font size style context) (move-text-position-down (get-in formatting [:spacing :line :above])) (write-line line context) (move-text-position-down (get-in formatting [:spacing :line :below])))))) | ||||||||||||||||
(defn- write-bullet-paragraph [c-stream formatting paragraph context] (let [{:keys [font style size color bullet-char broken]} formatting bullet (str (or bullet-char (char 149))) bullet-length (context/get-font-string-width font style size bullet context)] (-> c-stream (set-font font size style context) (set-color color) (new-line-by-font font size style context) (move-text-position-down (get-in formatting [:spacing :line :above])) (#(if-not broken (do (draw-string % bullet) (move-text-position-right % (* bullet-length 2))) (move-text-position-right % (* bullet-length 2)))) (write-line (first paragraph) context)) (doseq [line (rest paragraph)] (-> c-stream (move-text-position-down (get-in formatting [:spacing :line :above])) (new-line-by-font font size style context) (write-line line context) (move-text-position-down (get-in formatting [:spacing :line :below])))) (move-text-position-left c-stream (* bullet-length 2)) c-stream)) | ||||||||||||||||
(defn- write-paragraph-internal [c-stream formatting paragraph context] (let [paragraph-type (:elem formatting)] (cond (= paragraph-type :paragraph) (write-default-paragraph c-stream formatting paragraph context) (= paragraph-type :bullet) (write-bullet-paragraph c-stream formatting paragraph context) (= paragraph-type :number) (write-bullet-paragraph c-stream formatting paragraph context) :default (write-default-paragraph c-stream formatting paragraph context)) c-stream)) | ||||||||||||||||
(defn- write-paragraph [c-stream formatting paragraph context] (-> c-stream (move-text-position-right (get-in formatting [:indent :all])) (move-text-position-down (get-in formatting [:spacing :paragraph :above])) (write-paragraph-internal formatting paragraph context) (move-text-position-down (get-in formatting [:spacing :paragraph :below])) (move-text-position-left (get-in formatting [:indent :all])))) | ||||||||||||||||
(defn begin-text-block [c-stream] (.. c-stream (beginText)) c-stream) | ||||||||||||||||
(defn end-text-block [c-stream] (.. c-stream (endText)) c-stream) | ||||||||||||||||
(defn set-text-position [c-stream x y] (.. c-stream (setTextMatrix 1 0 0 1 x y)) c-stream) | ||||||||||||||||
(defn write-paragraphs [c-stream formatting paragraphs context] (doseq [[p-format paragraph] paragraphs] (write-paragraph c-stream p-format paragraph context)) c-stream) | ||||||||||||||||
(defn- optimal-font-size [font style start-size string max-width context] (let [string-width (context/get-font-string-width font style start-size string context)] (if (<= string-width max-width) start-size (letfn [(decrement-font-size [[current-size current-width]] [(dec current-size) (context/get-font-string-width font style (dec current-size) string context)])] (let [[best-font-size _] (first (drop-while #(> (second %) max-width) (iterate decrement-font-size [start-size string-width])))] best-font-size))))) | ||||||||||||||||
(defn write-unparsed-line [c-stream line context] (let [{:keys [align width height font size style color] :as formatting} (:format line) optimal-font-size (optimal-font-size font style size (:contents line) width context) line-length (context/get-font-string-width font style optimal-font-size (:contents line) context) line-height (context/get-font-height font style optimal-font-size context)] (-> c-stream (add-padding-horizontal line-length formatting) (add-padding-vertical line-height formatting context) (set-font font optimal-font-size style context) (set-color color) (draw-string (:contents line))))) | ||||||||||||||||