alt text alternatives

Alt Text - What is to be done? What should be done?

A number of accessibility requirements are coming in to law over the next year that are going to pose some unique challenges for scholarly publishers. The European Union: European Accessibility Act (EAA) and Americans with Disabilities Act (ADA). While there are many accessibly requirements, the most interesting one for scholarly publishers is the requirement to provide alt-text for images. Many of the other requirements - target size, keyboard navigation, colour contrast, screen reader compatibility, all of these are fairly straightforward from a web development point of view because they are straight up functional, and of you have a sane front end infrastructure then upgrading to them, while not trivial, does not pose any actual difficultly.

Alt text on the other hand is a whole other level of complexity. If you are a trade publisher or a newspaper, you are choosing what images you are choosing to publish. If you are a scholarly publisher the images are being sent in bulk by authors, with no level of control by the publisher. And you are getting a lot of images. Figuring out how to describe them is a huge task.

I think we have the following options.

Do nothing.

Has the advantage of being easy, but the disadvantage of being non compliant. I don’t think this is a viable option.

Use the image caption or image label.

Looking at Springer Nature and Elsevier I can see that this is the current approach. This is not adding anything to the usability of the alt text as the caption and image label are already in the full text, but is it compliant in regards to the legislation? I don’t know.

This has the advantage of being fairly easy, but has the potential disadvantage of being perhaps just a bit shit, and perhaps falling on the wrong side of legislation, as it is clearly on the wrong side of the intent of the legislation.

Get the authors to do it.

We could hoist another barrier for authors to climb over and force them to add descriptive text to the images. My understanding is that Springer Nature may be developing such a step, it may be just asking the author to proof already generated descriptions.

This has the advantage of creating metadata that is genuine more valuable than just plonking in the figure caption, but I suspect that authors are either going to half arse this job at some scaling point, and we are adding another step for authors, when they have enough barrier already.

Use an image recognition model that is out of the box today.

This has the advantage of being something we can just get started with right now, and in just the last month a number of models have been released with improved capabilities around being able to describe images. I’m sure one could find a model with the right cost and performance profile, but it has the distinct disadvantage that these models are not trained specifically on scientific content, so a fair percentage of descriptions are going to have errors.

Fine tune a model today and use that.

One could overcome the previous limitation by specifically fine tuning a model on ones own corpus. This has the advantage that it will produce significantly fewer errors, and it is technically feasible to do. That said I don’t think there are many publishers out there with the scale or capacity to do this kind of fine tuning, so we need to look to solutions providers to do this for us. There is also still going to be some level of error, and how one balances this with requirements for human review will be down to taking a balanced view between cost, risk, and what value having reasonable alt text in the majority of your images can give you.

Wait and use a model in the future.

Finally we can just wait 12 months (if extensions for compliance are allowed), and bet that the next generation of models will just be better and will have error rates so low that we don’t need to faff around with fine tuning or with human review.

What’s missing in this picture?

I’ve not included the option for human labelling of images. If this legislation had come in a few years ago that would have been the only option available, but it would never be cost effective. The fact that we have now a viable path with language models is astonishing. We could make this process so much more efficient than it could ever have been before.

But there is something better than optimising a process, and that is just not doing the process at all.

The one thing that I have not included here is the perspective of the users. I don’t know how visually impaired researchers want this information or how they are currently managing to do research in spite of the lack of alt-text. Given the wide spread availability of access to AI, and personal tools, should we make the assumption that those researchers who need detailed descriptions for their own particular type of work will end up using bespoke tools that give them much more of what they need than we could deliver by designing for everyone? In that world perhaps signposting in the HTML where the image is, and make isolating the images in their own well construed web component is the best experience we can provide to a highly digitally literate consumer? What I want to know more of is what does this user base really need, before embarking down a road of optimising a process that perhaps could be thought about differently.