http://bjoernstechblog.rueffer.info/posts/lisp/latex/bash/pdf/clojure/2011/01/13/Changing-page-numbers-in-PDF-files/
last updated on 25 May 2018

13 January 2011

Changing page numbers in PDF files

Changing the page numbering of existing PDF files with the usual unix command line tools is seemingly not so straight forward. Reading the manual, I thought that pdftk should be up to the job, but did have no success making it work on my PDF file in question. As the issue of renumbering pages in PDF files came up already a few times for me, I decided to give it a go and tackle the problem and invest a little more labour. I have found two working approaches.

Approach one: pdfoffset, a command-line tool written in Clojure, running on the JVM

Clojure is a modern Lisp dialect and generates programs that run on the Java Virtual Machine. Besides liking Lisp and wanting to try out something useful in my favorite language, the fact that anything running on the JVM should be highly portable and hence even more useful than a program designed for a more specific hardware platform, there in addition exists an extremely powerful PDF library for Java, called iText. In fact, pdftk itself is based on iText.

As it turns out, the current version of iText (5.0.5 as of this writing) is licensed under a variant of the well-known GNU General Public License (GPL), a free software license. Clojure on the other hand is licensed under the Eclipse Public License, another free software license. As an ordinary user you might think, hooray, how wonderful, I might as well combine these superbly crafted pieces of software. But dare you, the licenses are incompatible. More specifically, the GPL does not allow linking with otherwise licensed software. So, I could not create a software that combines Clojure and iText 5.0.5. In fact, I could not even write any piece of software in Clojure and release it under the GPL. How silly.

The resolution I found was to use an older version of iText, version 2.1.5, which happens to be (the relevant parts at least) licensed under the GNU Library General Public License (LGPL), which does allow linking. One might argue, that for a library this would be a much more natural choice for a license rather than the more restrictive GPL. Well, there’s nothing I can do about that.

So what I finally came up with is a wrapper written in Clojure that links to the relevant classes in the iText library to do what I want, which is renumber pages in PDF files. You can obtain a binary copy from here if you want. Note, however, that due to the massive Library involved, the binary is almost 6 Mega Bytes in size, although the source for what I wrote is only about 150 lines, including a lot of documentation (yes, Clojure is quite concise!). See for yourself, the relevant piece of code is here.

Approach two: A shell-script based on pdfLeTeX and the hyperref and pdfpages packages

Not only the size of the iText solution is a drawback of approach one, but I also experienced that my little software from approach one wouldn’t work with some PDF files. Cause unknown. And in fact, Peter Dower mentioned to me that it should be simple to yield the same functionality of pdfoffset with a LaTex and its friends (who happen to be my friends, too).

Script displayed further down below does exactly this. For anyone who has a current LaTeX system on his machine, this might be the preferable solution. The only difference is that this solution can only change the logical/physical page offset, and the resulting page numbers will be arabic numbers which could also happen to be negative. pdfoffset on the other hand can do roman numerals and letters (i.e., the latin alphabet) for numbering as well. But this comes at the price of a large overhead and does not seem to work reliable.

I do not happen to know if a similar functionality can actually be achieved with the LaTeX approach as well, but would be interested to learn whether or not it can be done.

So below you find the code for a shell-script that does page renumbering based on pdfLaTeX and the hyperref and pdfpages packages:

#!/bin/bash

# Copyright (C) 2011 by Bjoern Rueffer

# a little script based on pdflatex and the pdfpages and hyperref
# packages to redefine the logical first page in a PDF document

template (){   # arguments: <offset> <input filename>
cat <<EOF
\documentclass{scrartcl}%
\usepackage{pdfpages,hyperref}%
\begin{document}%
\endlinechar=-1%
%\read16 to \theoffset%
%\read16 to \filename%
%\setcounter{page}{-\theoffset}%
\setcounter{page}{-$1}%
\stepcounter{page}%
\stepcounter{page}%
%\includepdf[pages=-]{\filename}%
\includepdf[pages=-]{$2}%
\end{document}
EOF
}

if [ $# -ne 3 ]; then
	cat<<EOF
Usage: `basename $0` <offset> <infile.pdf> <outfile.pdf>
Filenames should not contain spaces.
EOF
else
	tempfile=`tempfile --suffix=.tex`
	template  $1 '$2'  > $tempfile
	pdflatex -output-directory `dirname $tempfile` $tempfile
	cp ${tempfile%%tex}pdf '${3%%.pdf}.pdf'
	rm ${tempfile%%.tex}*
fi
Björn Rüffer — Copyright © 2009–2018 — bjoern.rueffer.info