Tagged or not tagged
With great interest, I’ve been reading a discussion taking place on the WebAIM email list about PDF and accessibility. It started innocently when a list member asked if there was a checklist for accessible PDFs. Another list member replied that he’d just found one and listed the URL for it. The first list member replied that the PDF was inaccessible to his screen reader. The second list member replied that he assumed that a PDF about accessibility would be accessible and noted that the document had no tags.
By this point I’d downloaded the file and opened it in Acrobat Professional. I looked at the tags panel and saw tags there. I was confused as to why the other member saw no tags. Just as I was about to reply that it was, indeed, tagged someone else responded that the document had tags.
Then Andrew Kirkpatrick, Adobe’s senior project manager for accessibility replied that the document was not tagged and that Acrobat and reader would add temporary tags to documents. This was new to me, so I asked Andrew to elaborate, which he did:
When a user running an assistive technology tool opens a PDF files that lacks tags, Reader will tag it automatically (there are user preferences to allow/disallow/prompt for this). These tags are temporary in that when the file is closed they go away. The autotagging process is the same that occurs when you open a PDF file without tags in Acrobat and select the “add tags” feature, except that the add tags feature is designed to add them permanently since the Acrobat author is able to edit the file.
The heuristic for adding the tags in both cases is not able to add equivalents for images or determine a heading order with complete certainty. It does a good job with the reading order semantics in most cases, but if you have a PDF file with a complex table or columns of text with text boxes interspersed throughout the text I wouldn’t be surprised to see tagging issues.
This all comes back to the source file and the data within it. At this point authors have a variety of options for authoring tagged PDF files that are semantically correct, so repair of tagging for PDF files should be less necessary or unnecessary for newly authored PDF files.
While I didn’t distrust Andrew’s authoritative knowledge on the subject, I was still confused since I was not using assistive technology and when I closed and opened the file, the tags remained constant. Today, however, I looked at the file again and saw that the properties box says it is not tagged.
I wondered what would happen if I did an accessibility check on the document. I know that when you do an accessibility check on PDFs that are not tagged, the summary tells you they are not tagged and gives hints on how to remedy that. So I did an accessibility check and while the summary reported problems, it made no mention of the document not being tagged:
I then removed the [non-existent?] tags, saved the file and did the accessibility check again. This time the summary said that the document was not tagged:
So, even though this is pretty minor as accessibility issues go, I’m still confused. Usually when I open a file that has no tags I see no tags in the tags panel and the properties dialog box agrees. I’m not sure what was up with this particular file that it showed tags but the properties dialog box said it was not tagged, but I suspect it has something to do with the fact that it was created with an older version of Acrobat (7, according to the properties dialog box).