This is a place to document my process for making existing PDFs accessible. This is an evolving document because I keep on learning new things about PDF accessibility. If you are an expert in the field and if I’ve given incorrect information, please post a comment so I can continue to learn and fix my mistakes so others are not misled.
While there is information on the Internet concerning PDF accessibility, it is in many different places. There doesn’t seem to be one definitive place to find what I need, so I’m trying to gather that here for me and for those that work with me on making PDFs accessible.
Use the right side navigation to read the documentation. If you go in order from top to bottom, you’ll get all of the information. You can also pick and choose what you want to read, but if you’ve never worked with tags in a PDF file, I suggest you start at the top.
Part of my work goals is to update this blog to the new Acrobat Professional versions. I’d like to skip 10, but I know that lots of folks out there have 10 and not 11. So be patient while I work on this.
With great interest, I’ve been reading a discussion taking place on the WebAIM email list about PDF and accessibility. It started innocently when a list member asked if there was a checklist for accessible PDFs. Another list member replied that he’d just found one and listed the URL for it. The first list member replied that the PDF was inaccessible to his screen reader. The second list member replied that he assumed that a PDF about accessibility would be accessible and noted that the document had no tags.
By this point I’d downloaded the file and opened it in Acrobat Professional. I looked at the tags panel and saw tags there. I was confused as to why the other member saw no tags. Just as I was about to reply that it was, indeed, tagged someone else responded that the document had tags.
Then Andrew Kirkpatrick, Adobe’s senior project manager for accessibility replied that the document was not tagged and that Acrobat and reader would add temporary tags to documents. This was new to me, so I asked Andrew to elaborate, which he did:
When a user running an assistive technology tool opens a PDF files that lacks tags, Reader will tag it automatically (there are user preferences to allow/disallow/prompt for this). These tags are temporary in that when the file is closed they go away. The autotagging process is the same that occurs when you open a PDF file without tags in Acrobat and select the “add tags” feature, except that the add tags feature is designed to add them permanently since the Acrobat author is able to edit the file.
The heuristic for adding the tags in both cases is not able to add equivalents for images or determine a heading order with complete certainty. It does a good job with the reading order semantics in most cases, but if you have a PDF file with a complex table or columns of text with text boxes interspersed throughout the text I wouldn’t be surprised to see tagging issues.
This all comes back to the source file and the data within it. At this point authors have a variety of options for authoring tagged PDF files that are semantically correct, so repair of tagging for PDF files should be less necessary or unnecessary for newly authored PDF files.
While I didn’t distrust Andrew’s authoritative knowledge on the subject, I was still confused since I was not using assistive technology and when I closed and opened the file, the tags remained constant. Today, however, I looked at the file again and saw that the properties box says it is not tagged.
I wondered what would happen if I did an accessibility check on the document. I know that when you do an accessibility check on PDFs that are not tagged, the summary tells you they are not tagged and gives hints on how to remedy that. So I did an accessibility check and while the summary reported problems, it made no mention of the document not being tagged:
I then removed the [non-existent?] tags, saved the file and did the accessibility check again. This time the summary said that the document was not tagged:
So, even though this is pretty minor as accessibility issues go, I’m still confused. Usually when I open a file that has no tags I see no tags in the tags panel and the properties dialog box agrees. I’m not sure what was up with this particular file that it showed tags but the properties dialog box said it was not tagged, but I suspect it has something to do with the fact that it was created with an older version of Acrobat (7, according to the properties dialog box).
Until I read an answer to a question on the WebAim email list concerning page numbers in PDF files, I always marked page numbers as artifacts. After thinking about the answer (yes, tag the page number as text and have it read first) and deciding it made sense, I began doing just that and told the folks working with me to do the same.
Then one of the folks working with me questioned this practice because he thought that the user might get confused when a page number was read in the middle of the paragraph when paragraphs spanned two pages. I suggested that instead of automatically having the page number read first, perhaps figuring out where it best fit in the context would be better — like when the paragraph ended. However, I wanted to ask around and see how others handled page numbers in PDF files.
I asked this question on Twitter and got this answer:
I do not tag page numbers. That doesn’t mean it’s right. My logic is that page numbers within the context of the document is out of context.
The question I ask myself is where in the context of the TAGS would a page number make sense?
Again I thought about it and decided that v made sense.
So, now I no longer tag page numbers as text because besides the logical arguments against it, it sometimes takes a long time.