Accessible Format Production Part 3: Making Accessible PDF

Once again, there are numerous programs that can edit PDFs. Unfortunately, I have yet to find a free (or very cheap) one that allows you to edit even the basic pieces I talk about below. Would love to hear if anyone has recommendations.

Anyway, that means I will discuss what needs to be done but not how to do it in a specific program (they should already have documentation on how).

Overview

I’m going to split the accessibility features into two parts: basic and full. Making sure a PDF is fully accessible can be time consuming, so basic accessibility features are ones that are helpful to everyone using a PDF file and generally only take a small amount of time.

A reminder that this blog post is part of a series, which assumes that PDF is the first format you’re creating. If you have a digital document that you are converting to PDF, then make sure your document is accessible (which I will cover in a later post) before converting to PDF, which should result in a (mostly) accessible PDF.

Basic PDF Accessibility

At the very minimum, your PDF should be text accessible (meaning it has been OCR’ed), which I covered in the previous post. With only a little bit of extra time, you should add/check for the following:

Page Numbers

Make sure the page numbers match the original material.

In many cases, front matter or rear matter may not have page numbering. Typically, for front matter, use lower case Roman numeral starting from the first page. For rear matter, you can let the Arabic numbering continue to the end.

Tagging Headers

Tags refer to structured blocks (similar to HTML tags) and while it is better to have all the content tagged properly, if nothing else, the PDF should have the headers tagged.

Headers can have different levels: h1 to h6. What level a heading is really depends on how a book is divided. The easiest way is to use the table of contents (if available) where top level sections (ones all the way to the left) are level 1, indented lines are level 2, twice indented lines are level 3, and so on.

Some programs will tag your PDF automatically (whether part of sophisticated scanning software or PDF editing software), but you should still check that the heading levels are correct.

Bookmarks

One of the main reasons you want the headings to be properly tagged is to automatically create bookmarks. The PDF editor should allow you to create bookmarks from structural elements.

Do take a look at the documentation for your PDF editing program before you start. If it’s a lot more time consuming to tag then create bookmarks, then simply create the bookmarks manually. This is not ideal, but the idea is that bookmarks are useful for everyone (less so for the tagged structure part).

Full(er) PDF Accessibility

Making a PDF fully accessible can be very time consuming. It’s a matter of finding a balance between what users need or want versus the cost. If a PDF needs to be fully accessible, then you might even consider converting it to a document, editing it, then exporting back to PDF.

Language

Make sure the PDF specifies the correct language.

Passages within the text should specify the language for those specific passages.

Tagging

Beyond headings, a PDF should have properly tagged:

paragraphs
tables
lists
figures (and captions)
links
formulae* form fields (and labels)
background elements (which in some cases means it’s not tagged)

All your elements, except background elements, should be within the tagged structure. If an element is left out, then mostly likely the content will not be read by text-to-speech or screen readers.

Image Descriptions

Images need to be described, frequently referred to as “alt text”. Image descriptions should follow web content conventions in when and how they are described.

Typically the image properties have a text area box that you can fill in.

Reading Order

Obviously, reading order is important and it is not uncommon that by default, the structure might not be as intended (again, this normally depends on how sophisticated your scanning software is).

The best example is text in multiple columns. Basic OCR software will only convert the text, not reading order, so if you were then to convert the PDF to text, you would end up with the first line of the first column followed by the first line of the second column.

Make sure that the reading is in the correct order, that images are read either where they are referred to, or more typically, between paragraphs.

For forms, there is also the tab order, which specifies the order in which a reader tabs through the form fields. Since the focus of these posts is reading material, I’ll just leave that as a simple reminder.

And More?

I believe I have covered everything that is required at least in the Web Content Accessibility Guidelines 2.0. There are definitely a couple of more things to consider if you are dealing with PDF forms, such as marking required fields, but as I mentioned before, since the focus here is reading material, I have not touched on all the aspects of PDF forms.

Some programs will also do accessibility checks for you, which may including something I have not mentioned here (which I would like to hear about if you encounter it).

Take Away

PDF is great at keeping the original look of the material that you are scanning. However, PDF can be very time consuming to make accessible.

In my experience, most people who need documents to be fully accessible deal better with electronic text documents (RTF, DOC files) than PDF, and those who prefer PDF, even with text-to-speech software, are visual people that don’t need more than the basic accessibility features.

As a result, most accessible format producers will create PDF with basic accessibility features as a default and do more only if requested.

For most people, it is more important that the documents you create are accessible, and that you are converting them into accessible PDF. This topic and more coming soon in future posts in this series.