http://bjoernstechblog.rueffer.info/posts/git/pdf/2011/10/12/Watermarking-PDF-files-with-GIT-commit-id/
last updated on 25 May 2018

12 October 2011

Watermarking PDF files with GIT commit id

Do you have a pile of old print-outs of PDF documents on your desk that correspond to different versions of the same file, but you can’t see which printout corresponds to which version? And the file in question sits in a git repository?

Well, your prayers shall be heard. Here’s the answer to that mess: Just watermark each PDF before you print it with the git commit id. You can find out the current git commit id with git describe --always.

How to get that onto you PDF? Either use a pen or pdflatex with the pdfinput and tikz packages. For those of us who don’t like to use pens the script below automates that task. Just say (assuming you have named the script commitid and put it to some place in your PATH)

commitid filename.pdf

and the file will be watermarked with the current git commit id (well, something like that).

Enjoy!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
#!/bin/bash
# Time-stamp: <2013-03-01 18:33:01 rueffer>
# Copyright (C) 2011-2013 by Bjoern Rueffer


function usage {
    cat <<EOF
Usage:  `basename $0` [-p] <files> 
        `basename $0` --clean

In the first form this command will take files located in a git
repository and watermark them with the current commit id.  

In the default behavior (without -p option), the original files will
remain untouched, and copies with the commit id appended to the
filenames will be created.  If the -p option is given, the original
files will be modified instead, i.e., filenames will be preserved.

For pdf files the watermark will be placed on top of each PDF page.
For tex and bbl files it will be added to the head of the file as a
comment line. In this case any existing such watermarks will be
replaced.  Other file types are currently not supported.

In the second form (--clean) this command will interactively delete
all files that seem to have been generated by this program.

Copyright (C) Bjoern Rueffer 2011-2013
EOF
}

function parse_git_branch {
  git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/\1/'
}

function watermark () {
    if !(echo "$1" | grep -i 'PDF\|TEX\|BBL\|BIB$' >/dev/null 2>&1); then 
	cat <<EOF
Error: The file $1 appears not to be PDF, TEX, BBL, or BIB format. 
I can only process these file types.
EOF
	exit 1
    fi

    COMMITID=`git log --pretty=%h -1`
    cp "$1" /tmp/pdf_to_stamp.pdf
    FILENAME=$(echo "$1" | sed -e 's/\(_[0-9a-f]\{7\}\)*//g')
    OUTFILE=$(echo "$FILENAME" | sed -e 's/\(.*\)\.\(...\)$/\1_'$COMMITID'.\2/')
#    COMMITID=`git log --pretty="%h %ai" -1`
    
    echo -n Processing "$1" with git commit ID $COMMITID, producing $OUTFILE ...

    if `parse_git_branch |grep -i '(no branch)' >/dev/null 2>&1 `; then 
	BRANCHNAME=""
    else
	BRANCHNAME="$(parse_git_branch)"/
    fi

    if (echo "$1" | grep -i 'PDF$' >/dev/null 2>&1); then 
	COMMITID=`git log --pretty='%h %ai}\hbox{'"$BRANCHNAME"'%f' -1` 
    else
	COMMITID=`git log --pretty='%h %ai '$BRANCHNAME'%f' -1`
    fi

    if (echo "$1" | grep -i 'PDF$' >/dev/null 2>&1); then 
	cat >/tmp/commitid.tex <<EOF
\documentclass{scrlttr2}
\usepackage[english]{babel}
\usepackage{pdfpages,pgf,tikz}
\pagestyle{empty}
\begin{document}
\includepdf[fitpaper,pages=-,%
picturecommand={},%
pagecommand={
  \begin{tikzpicture}[remember picture, overlay]
    \node [xshift=0.5cm,yshift=-0.5cm,below right, fill=yellow!50, rounded corners, opacity=.8] at (current page.north west) {
      \vbox{\hbox{commit id: $COMMITID%
	\qquad p.\thepage}}
    };
  \end{tikzpicture}%
}%
]{/tmp/pdf_to_stamp.pdf}
\end{document}
EOF
	# \parbox{10cm}{commit id: $COMMITID \qquad p.\thepage}
	(cd /tmp; pdflatex commitid && pdflatex commitid) >/dev/null 2>&1
	cp /tmp/commitid.pdf "$OUTFILE"
	rm /tmp/commitid*
    elif (echo "$1" | grep -i 'TEX\|BBL\|BIB$' >/dev/null 2>&1); then
	cp "$1" /tmp/commitid.txt
	if (head -n 2 /tmp/commitid.txt | tail -n 1 | grep "GIT COMMIT ID" >/dev/null 2>&1); then
	    sed -ne '4,$p' /tmp/commitid.txt > /tmp/commitid.txt2
	    mv /tmp/commitid.txt2 /tmp/commitid.txt
	fi
	(printf "%%\n%% GIT COMMIT ID $COMMITID\n%%\n"; cat /tmp/commitid.txt) > $OUTFILE
	rm /tmp/commitid*
    fi
    echo \ done.
}



# preserve filenames? default to no
unset preserve


# main()
if [ "$1" = '--help' -o $# -eq 0 ]; then
    usage
    exit 0
elif ! git describe --always >/dev/null 2>&1; then
    cat <<EOF
Error: This command must be invoked in a git repository!
(Otherwise I cannot know which commit ID to use.)
---
EOF
    usage
    exit 1
else while [ $# -gt 0 ]; do
    if [ "$1" = '-p' ]; then
	preserve=t
    elif [ "$1" = '--clean' ]; then
	echo Erasing all tagged files...
	rm -i $(find . | sed -ne '/_[0-9a-f]\{7\}\./p' | tr '\n' ' ')
	exit 0
    elif [ -f "$1" ]; then
	watermark "$1"
	if [ -n  "$preserve" ]; then 
	    mv $OUTFILE "$1"
	fi
    else
	echo Skipping "$1", it does not seem to be a valid file. Invoke \"`basename $0` --help\" for usage information.
    fi
    shift
    done
fi

Comments (through old commenting system)

Björn, 18 October 2011:  I've enhanced the features even more: now multiple files can be watermarked and also tex and bbl files are supported. Here's a gist.

Björn Rüffer — Copyright © 2009–2018 — bjoern.rueffer.info