Accessible Format Production Part 4: Converting PDF & EPUB to E-Text

This is going to be a fairly short post. I would have combined it with another part but the parts before and after are long enough to need to be their own posts, so here we are. 

Software To Use

Scanning software will frequently let you save in various formats and if your scanning software automatically tags headings and the rest for you it should do a good job of converting to doc or RTF with that formatting intact.

Most PDF editors will also convert to doc or RTF. In Acrobat, it’s as simple as choosing the format under the export as.

What I will cover is using calibre mostly because it’s free, and because it will also convert ePub files. Often if you can get an ebook file (rather than having to scan from print), it will be in ePub format.

Using Calibre

Conversion in calibre is quite simple. After installation:

  1. Load the file in calibre.
  2. Select the file and click on “Convert books”.
  3. Choose the format you want to convert to.

Calibre can convert directly to RTF, but I find the spacing is sometimes odd (lots of extra tabs and spaces), so I will frequently prefer to convert to HTMLZ first then open the HTML file in the document editor (just remember to break the links to images).

One of the main reasons I like to use calibre to convert files is that you can do it in a batch using the bulk convert option.

The major downside I have found is that sometimes the images converted from PDF turn into black and white inverted colour images. I haven’t been able to figure out what triggers this issue either.

Other Options

Update (Jan 27, 2016): I found a great, free online tool called SmallPDF. The technology that they use is the same as what’s used in Adobe Acrobat. If you follow their links, you can also buy a desktop version (Windows only though) by the company that created it.

Search the web and you will find a whole list of free converters online and for installation. Many that claim to be free, but are just trials. I have yet to try them, but a couple look promising at least:

  • Nemo PDF to Word Converter (Windows)
  • Zamzar PDF to RTF (Online)
  • Nitro (Online)

Last Note

Okay, this post almost seemed silly to write, since there are lots of options out there, but hopefully in the future I will update the post with some notes on how well some of the other converters work.