Visually-impaired accessibility is fundamentally broken. Here's what we can do about it.
There are a lot of privileges most of us probably take for granted. Not everyone is gifted with the ability to do basic things like talk, walk, see, and hear. Those of us (like myself) who can do all of these things don’t really think about them much. Those of us who can’t, have to think about it a lot because our world is largely not designed for them. Modern-day things are designed for a fully-functional human being, and then have stuff tacked onto them to make them easier to use. Not easy, just “not quite totally impossible.”
Issues of accessibility plague much of modern-day society, but I want to focus on one pain-point in particular. Visually-impaired accessibility.
Now I’m not blind, so I am not qualified to say how exactly a blind person would use their computer. But I have briefly tried using a computer with my monitor turned off to test visually-impaired accessibility, so I know a bit about how it works. The basic idea seems to be that you launch a screen reader using a keyboard shortcut. That screen reader proceeds to try to describe various GUI elements to you at a rapid speed, from which you have to divine the right combination of Tab, Space, Enter, and arrow keys to get to the various parts of the application you want to use. Using these arcane sequences of keys, you can make the program do… something. Hopefully it’s what you wanted, but based on reports I’ve seen from blind users, oftentimes the computer reacts to your commands in highly unexpected ways.
The first thing here that jumps out to most people is probably the fact that using a computer blind is like trying to use magic. That’s a problem, but that’s not what I’m focusing on. I'm focusing on two words in particular.
Screen. Reader.
Wha…?
I want you to stop and take a moment to imagine the following scenario. You want to go to a concert, but can’t, so you sent your friend to the concert in your place and ask them to record it for you. They do so, and come back with a video of the whole thing. They’ve transcribed every word of each song, and made music sheets detailing every note and chord the concert played. There’s even some cool colors and visualizer stuff that conveys the feeling and tempo of each song. They then proceed to lay all this glorious work out in front of you, claiming it conveys everything about the concert perfectly. Confronted with this onslaught of visual data, what’s the first thing you’re going to ask?
“Didn’t you record the audio?”
Of course that’s the first thing you’re going to ask, because it’s a concert for crying out loud, 90% of the point of it is the audio. I can listen to a simple, relatively low-quality recording of a concert’s audio and be satisfied. I get to hear the emotion, the artistry, everything. I don’t need a single pixel of images to let me experience it in a satisfactory way. On the other hand, I don’t care how detailed your video analysis of the audio is - if it doesn’t include the audio, I’m going to be upset. Potentially very upset.
Now let’s go back to the topic at hand, visually-impaired accessibility. What does a screen reader do? It takes a user interface, one designed for seeing users, and tries to describe it as best it can to the user via audio. You then have to use keyboard shortcuts to navigate the UI, which the screen reader continues to describe bits of as you move around. For someone who’s looking at the app, this is all fine and dandy. For someone who can kinda see, maybe it’s sufficient. But for someone who’s blind or severely visually impaired, this is difficult to use if you’re lucky. Chances are you’re not going to be lucky and the app your working with might as well not exist.
Why is this so hard? Why have decades of computer development not led to breakthroughs in accessibility for blind people? Because we’re doing the whole thing wrong! We’re taking a user interface designed specifically and explicitly for seeing users, and trying to convey it over audio! It’s as ridiculous as trying to convey a concert over video. A user who’s listening to their computer shouldn’t need to know how an app is visually laid out in order to figure out whether they need to press up arrow, right arrow, or Tab to get to their desired UI element. They shouldn’t have to think in terms of buttons and check boxes. These are inherently visual user interface elements. Forcing a blind person to use these is tantamount to torture.
On top of all of this, half the time screen readers don’t even work! People who design software are usually able to see. You just don’t think about how to make software usable for blind people when you can see. It’s not something that easily crosses your mind. But try turning your screen off and navigating your system with a screen reader, and suddenly you’ll understand what’s lacking about the accessibility features. I tried doing this once, and I went and turned the screen back on after about five minutes of futile keyboard bashing. I can’t imagine the frustration I would have experienced if I had literally no other option than to work with a screen reader. Add on top of that the possibility that the user of your app has never even seen a GUI element in their lives before because they can’t see at all, and now you have essentially a language barrier in the way too.
So what’s the solution to this? Better screen reader compatibility might be helpful, but I don’t think that’s ultimately the correct solution here. I think we need to collectively recognize that blind people shouldn’t have to work with graphical user interfaces, and design something totally new.
One of the advantages of Linux is that it’s just a bunch of components that work together to provide a coherent and usable set of features for working on your computer. You aren’t locked into using a UI that you don’t like - just use or create some other UI. All current desktop environments are based around a screen that the user can see, but there’s no rules that say it has to be that way. Imagine if instead, your computer just talked to you, telling you what app you were using, what keys to press to accomplish certain actions, etc. In response, you talked back to it using the keyboard or voice recognition. There would be no buttons, check boxes, menus, or other graphical elements - instead you’d have actions, options, feature lists, and other conceptual elements that can be conveyed over audio. Switching between UI elements with the keyboard would be intuitive, predictable, and simple, since the app would be designed from step one to work that way. Such an audio-centric user interface would be easy for a blind or vision-impaired person to use. If well-designed, it could even be pleasant. A seeing person might have a learning curve to get past, but it would be usable enough for them too. Taking things a step further, support for Braille displays would be very handy, though as I have never used one I don’t know how hard that would be to implement.
A lot of work would be needed in order to get to the point of having a full desktop environment that worked this way. We’d need toolkits for creating apps with intuitive, uniform user interface controls. Many sounds would be needed to create a rich sound scheme for conveying events and application status to the user. Just like how graphical apps need a display server, we’d also need an audio user interface server that would tie all the apps together, letting users multitask without their apps talking over each other or otherwise interfering. We’d need plenty of apps that would actually be designed to work in an audio-only environment. A text editor, terminal, and web browser are the first things that spring to mind, but email, chat, and file management applications would also be very important. There might even be an actually good use for AI here, in the form of an image “viewer” that could describe an image to the user. And of course, we’d need an actually good text-to-speech engine (Piper seems particularly promising here).
This is a pretty rough overview of how I imagine we could make the world better for visually impaired computer users. Much remains to be designed and thought about, but I think this would work well. Who knows, maybe Linux could end up being easier for blind users to use than Windows is!
Interested in helping make this happen? Head over to the Aurora Aural User Interface project on GitHub, and offer ideas!