I came across this video at the top of CNN last night and thought it was a pretty sweet story. A group of gamers carefully cataloged every sound and every move needed to beat a game – which seems to be Legend of Zelda: Ocarina of Time – and gave the info to a blind player they’d met on an online discussion forum so that he could beat the game. Check it out:

Blindness and video games is actually an issue that gets talked about fairly often, whether in the context of a master’s thesis about how to make more accessible video games or blind players discussing the ways that they navigate World of Warcraft.

One interesting element of the Zelda case above is how rich the audio is in Zelda, with very specific cues for specific attacks and in-game events, like opening a treasure chest –

Cues like these are undoubtedly important for people with vision impairments, and I know that I rely on them to keep me on track and entertained in my own gameplay. These sounds, then, seem like an important element of universal design in games, as they may provide helpful information for large numbers of players. Yet, overreliance on audio information can also be a problem, as deaf players may find themselves excluded from Warcraft raids in which players are all using headsets and voice chat instead of text chat.

The mismatches in audio and visual needs only highlight the continued need for improvement in text-to-speech and speech-to-text technology. These technologies are getting a lot of attention this week, with Roger Ebert debuting his text-to-speech voice (compiled from old video clips of Ebert’s actual voice) on Oprah and YouTube announcing the full roll-out of its autocaptioning service, which I blogged about during its initial stages last fall.

But, I think the human, community element of this particular story is also fascinating – I don’t know if perfect code-driven accessibility will ever be possible without some degree of human interpretation of language and meaning, and I like seeing instances in which people can pool their resources to make a more accessible world (at least for this one Ocarina player). Plus, the fact that this occurred in a gamer community around Zelda is a fun connection to my partner, whose dissertation was partially about the activities on Zelda forums, and who sent me the video in the first place!

Garland-Thompson, a white woman with short white hair and dark glasses, speaking.Just a quick post today – I’m dying to read Rosemarie Garland-Thompson’s book, Staring: How We Look but couldn’t find a way to work it into the prelims reading lists that dominate my free time right now. Until I get around to reading it, though, this video is a nice introduction.* Would be great for teaching, but is also just a well-done blurb on the issue.

*Unfortunately, it is not part of YouTube’s attempted captioning program, and doesn’t have any captions of its own.

YouTube official logoYesterday, Google announced that it would deploying several new options for increasing the number and quality of closed captioned videos on the site. The New York Times reported on this as a first step to making videos available to deaf and hearing-impaired audiences, but it seems clear that there are a lot of potential beneficiaries – foreign language audiences (captions can be translated to 51 languages), those of us who can’t turn on the speakers at work, and anyone who wants to search the verbal content of a video.

So, how are they doing it? First, speech-to-text technology currently used by Google Voice is being applied to a small number of videos on the site (largely educational content) to produce captions automatically.

“Because the tools are not perfect, we want to make sure that we get feedback from the video owners and the viewers before we roll it out for the whole world,” Mr. Harrenstien said. “Sometimes the auto-captions are good. Sometimes they are not great, but they are better than nothing if you are hearing-impaired or don’t know the language.”

Presumably, if this works, speech-to-text will be rolled out more broadly. For now, you can take a look at how this works below. To see the captions, Google/YouTube explains – “Click on the menu button at the bottom right of the video player, then click CC and the arrow to its left, then click the new “Transcribe Audio” button.” I’ve picked a clip of PBS’s upcoming series This Emotional Life, focusing on Asperger’s.

Obviously, it’s not perfect – “Asperger’s syndrome” is transcribed as “Mister Gerson” – so I hope that speech-to-text improves before this initial stage is extended to other videos. This, however, leads to the second option that Google/YouTube have made available, which is to provide your own captions for videos you upload.

Now, after you upload a video, you can also upload a text file  – YouTube will combine the video and the text to create captions. Through “auto-timing,” YouTube will match a transcript (a file with only verbal content) to the video using speech recognition, or will match a caption file (which includes time codes for the text to appear) to the video. The help file on this seems fairly clear, and also includes tips like including bracketed information about non-verbal sounds [whistling], or using >> to indicate changing speakers inthe captions.

I gave it a try – not the easiest experience. They weren’t kidding when they said that clear speech works best, as my transcript file (no time code) was not able to be matched and displayed as captions. People singing to cats didn’t translate well. Thus, to get a captioned video, I had to try the old fashioned way, creating a .sub file with time codes. This quickly got me in a bit over my head – while I could do it, given the time, there’s a reason most people don’t caption their YouTube videos. It’s time intensive, there’s a learning curve involved, and the results may not seem important enough to justify the work.

This, of course, is exactly why forays into speech-to-text and auto-timing are so exciting. If captions could be created automatically, or from a simple text file, captions on user-created video would certainly become more common and make the world more accessible. While the tools as they are today aren’t anywhere near perfect, it’s certainly a first step in creating automatic accessibility features for participatory media.

As someone who studies accessibility and internet media, I’m constantly torn between getting excited about social/participatory media and being disappointed in their access options. This WordPress blog I’m using is notoriously terrible in its implementation of image alt text, for instance. Blogging has given so many people an outlet to write and connect, but if they want to make a blog accessible, it takes additional research and effort. Attempts to build accessibility features in automatically are, in my opinion, game-changers when they’re done well. I’ll withhold judgment on this YouTube move for now – it has potential – but I’ll be watching to see whether it develops .