For many people who are Deaf or hard of hearing, captions on video and film are a necessity. By providing a synchronized text version of narration and dialogue, plus non-speech information like lyrics, sound-effects or off-screen sounds, captions allow viewers with hearing disabilities to fully understand and engage with what they’re watching. And if you’ve ever watched TV with captions in a busy airport or crowded bar, you know that captions are a prime example of how accessibility makes life better for everyone.
Recently, some video production tools and online platforms, including YouTube and Facebook, have started using speech recognition to automatically generate captions. Creating captions manually is expensive and time-consuming, so on the surface, auto-generation seems like an easy win — a way to instantly make millions more videos accessible to people with hearing disabilities.
But if you spend some time watching videos with auto-generated captions, you’ll quickly see that speech recognition technology for captioning still has a long way to go. Misheard words, punctuation errors, and grammatical issues are rampant, resulting in captions that are often unreadable.
We looked at a few examples of these on our game show “Who Wants to Be an Accessibility Champion?” which brought together accessibility advocates to help us showcase accessibility tools, including AI-generated alt text and auto-generated captions.
Auto caption examples
Here’s an example from Harry Potter and the Sorcerer’s Stone.
The auto-generated caption for this scene reads “mental that one’s good Helena.”
And here’s what’s actually being said, “Mental that one, I’m telling ya.”
As you can imagine, it’s incredibly frustrating to watch something while having to constantly decipher the captions along the way. Dialogue and narration move quickly, so before a viewer has puzzled through one caption, the next one may already be up on screen. If viewers are frustrated enough, they’ll simply stop watching.
In the case of news or how-to videos, auto-captions may serve up incorrect information or instructions – which can actually be dangerous. W3C (the World Wide Web Consortium), the organization behind the WCAG, gives the example of a cooking video where the spoken audio was “Broil on high for 4 to 5 minutes. You should not preheat the oven.” The automatic caption said something completely different: “Broil on high for 45 minutes. You should know to preheat the oven.” It’s an inaccuracy that could lead to a fire — or at the very least, dinner that’s burned to a crisp.
Does this mean that you should never use automatic captions? Not necessarily. They can still be a useful tool — but they should be just one step in the process. Before you launch a video, carefully review automatic captions for grammar (spelling, punctuation, etc.), meaning, and accuracy and make any needed corrections.
Remember: when it comes to making digital experiences more accessible, doing it poorly can be worse than not doing it at all. For the sake of your brand, your bottom line, and, of course, the people who will interact with your content, it’s worth making the investment to get it right.
Find out how Perkins Access’s tailored, roles-based training programs can help your team build expertise on captions and other aspects of accessible digital design.