Batch Appending a Single PDF to multiple PDFs

So recently, I came up to the problem of having to add a page at the end of multiple PDFs.A couple of years ago, I’d done some work with GhostScript to merge a bunch of PDFs, so I thought I’d start there.

Use Case

I have a bunch of PDFs, and what I have is another PDF with a single page in it. I need to add the single page PDF at the end of every PDF. Of course, I wanted to do this in an automated way since I have quite a number of PDFs.

Basically, I had to do this in order to add an informational (terms of use) page to all the PDFs I have before uploading it to a repository.

Using GhostScript to Merge PDFs Together

The command to merge two PDFs in GhostScript is fairly simple. If you use the -o switch, it’s even simpler:

[code]
gs -sDEVICE=pdfwrite -o output.pdf file.pdf lastpage.pdf
[/code]

A Quick Note on Setup

If you’ve never set up GhostScript before, after installation, you need add a path so it understands the command when you call gs.

  1. Open control panel → system → advanced system settings → environment variables
  2. Under system variables → Select ‘Path’, then click on ‘Edit…’
  3. Paste the path of where ghostscript is installed e.g. C:\Program Files\gs\gs9.15\bin

Automating with a Batch Script

I will preface this section with the fact that I am using a Windows command line, so these commands will need to be modified slightly for a Mac or Linux machine.

For those who are not familiar with .cmd files, basically, instead of opening a command line window, having to traverse to the folder, and then running a command, you can open Notepad, type in the command, then save it with .cmd extension. To run it, just stick it in the folder you want it to run in, and double-click to run (just like opening a program).

[code]
FOR %%G IN (*.pdf) DO IF NOT %%G==lastpage.pdf gswin64c -sDEVICE=pdfwrite -sOutputFile="output\%%G" -dNOPAUSE -dBATCH "%%G" lastpage.pdf
Note: If you’re running from the command line window, it’s a single % for variables, so use %G.
[/code]

The code is a basic for loop going through every .pdf file in the current folder that isn’t the lastpage.pdf file. Meeting those conditions, it runs GhostScript (the gs run command differs depending on your OS, so for 32-bit Windows, it’s gswin32c. Just check the documentation.). GS will do a pdfwrite outputting all results (using the file name of the original PDF) to a subfolder “output”, which has to exist already, (without pausing, in a batch) in merging each file with lastpage.pdf.

The Problem with Using GhostScript

We now have a simple piece of code to fix our problem. Great, right?

Well, I thought so, except GhostScript literally walks through every page of a PDF and adds it to a resulting PDF. This results in font substitution and other errors at times.

GhostScript is meant to interpret every file you give it, so it won’t simply append a PDF file. That’s not what it’s for.

For a proper explanation, check out this stackoverflow explanation on why GS does this.

The advantage if you ever need to do it is that GhostScript can manipulate stuff inside a PDF if you need to, such as deliberately substituting font.

Using PDFtk to Append PDFs

In my searches, I had encountered PDFtk, so I wasn’t entirely surprised when two people answered my question in getting around my GS problem by using it.

[code]
FOR %%G IN (*.pdf) DO IF NOT %%G==lastpage.pdf pdftk "%%G" lastpage.pdf cat output "output\%%G"
[/code]

Basically using the same code except now using the pdftk function to merge PDFs, where the pdftk command is in this format:

pdftk inputfile1 inputfile2 cat output outputfile

The ‘cat’ probably stands for ‘concatenate’ or join, and PDFtk will do it in the order presented, so inputfile1 first, appending inputfile2 after, then spit it out with outpufile name.

Take Away

PDFtk definitely seems to be the tool to use if you want to manipulate PDFs by file or by page. GhostScript is great if you want it to interpret any part of the data, especially for converting to/from PDF and the supported file types.

As a side thought, a lot of people I know think this sort of thing is hard because they don’t think they have the ability to understand and use the command line, but I’m not super familiar with it either. Anyone will know that I tend to avoid it if I can, but this sort of thing save so much time with only minimal knowledge of how it works.

Reference