We frequently get this question or a variant of it. I’m honestly surprised how many people work in the field of accessibility, but don’t know the basics about PDF and how to check for accessibility.
PDF is Image Based
That’s not really technically correct, but I have encountered many people who think that text in PDF is text rather than an image. A PDF may be a PDF version of a digital document or may be a scan of a print out. It may be impossible to tell just by looking, but there are quick ways to test.
Quick Test: Is it Text?
To do this test, open the PDF in a PDF reader (not in a browser, e.g. Firefox or Chrome). Your cursor will likely change as well, but just try to select some text.
If you can select individual characters rather than blocks, then it’s accessible at a minimal level.
Quick Test #2: Is it Text That Makes Sense?
Say that your file seems like a low quality scan, or when you’re selecting the text, it looks a little off. In my example here, some characters in the vertical text is selected along with the rest of the text. That seems a bit odd.
I started the selection from the top of the page, because I want to know what order the text a text-to-speech (TTS) would use as well as what text it would read. By copy and pasting the text into a text editor (e.g. Notepad), you can see what the computer sees. For the above example, this is the text I get:
some early game-changing wins. And to do that they’d need some
of that fearlessness that she proudly advocates for with her bracelet.
… She would tell them,
I can immediately see that the vertical text is causing problems, where a TTS would read the vertical text horizontally as part of the regular text and in the incorrect order.
Fixing the Issue
Each file is a little different. In Acrobat Adobe Pro, there are a number of tools that let you create and verify the accessibility of a PDF.For the example here, I might try to fix the OCR or use the Touch Up Reading Tool to tell it to ignore the vertical text by marking it as part of the background.
On a side note, you could force Acrobat to redo the OCR if you save it a high quality TIFF file, then reopen in Acrobat.
Editing the reading order for every single page would take a long time. In this case then, since the vertical text is simply the book and chapter title and not actually important to have on all the pages, the easiest solution is to simply crop them out.
Levels of Accessibility in PDF
As I mentioned, having readable text in a PDF is only a minimal level for accessibility purposes. To make an accessible PDF, it should have:
- correct language specified
- alt text for images and interactive objects, and hide purely decorative images
- everything in the correct reading order
- headings, links and tables with proper markup
Additional features to make the PDF even more accessible, provide:
- correct page numbering i.e. numbering matches document e.g. xviii for 8th page of a preface even if this may be page 9 of the file
- bookmarks to navigate between sections e.g. chapters
- running headers & footers
- language specifications for specific passages as needed
There are others including those specific to forms, and some techniques provide even further accessibility are mentioned in w3c’s techniques to make PDFs accessible.
What We Use
If you’re dealing with a large volume of text, you can buy OCR software that will not only do text OCT better, but will properly divide up the blocks properly (e.g. multiple columns, and it would recognize the vertical text as a separate block in the example above). We use Omnipage Ultimate, which works really well. Not sure about cheaper options other than the non-Ultimate version.
Even “regular” users will appreciate many of the features I’ve listed, especially the most basic readable text, bookmarks, and correct page numbering. Improve the accessibility of your documents for everyone!