Sunday
2 Apr 2006
Down With Audio Interfaces
I often get asked about the future of interfaces: “Wouldn’t it be great”, people say, “if we could just talk to our computers like in Star Trek? Aren’t voice recognition and talking computers the interface of the future?” A lot of people seem to think that all interface problems can be solved via voice. But I have a one word answer: Voicemail.
Everyone hates voicemail and voicemail systems. And with good reason. These days voicemail is getting pretty “smart” : you can now say “Yes” and “No” instead of pressing 1 or 2 in response to questions (unless you have an accent, in which case don’t bother). You can even say a person’s name to be connected to a someone else’s extension. But technical problems aside, these are patches on a fatally flawed medium.
Audio interfaces will always lack something that visual interfaces posess effortlessly: the ability to jump around at will. If you don’t care about the information in a paragraph, you skip to the next one. You don’t have to inform the piece of paper you are reading that you want to navigate, you just do it. You look here, then there. You scan. You find what’s interesting. Visual interfaces excel because they let you throw away the unneeded information-chaff and focus on what you want to know. There is no analog in the audible world. When you’re listening, it takes substantially longer to know where you are and what you’re listening to. When using an audio interface, you are forced to be linear: a word follows the word that came before it and precedes the word after it. There is no way to get to the last word without hearing the two words before it. There’s no getting around it. It sucks.
An example: imagine you are using a conventional voicemail implementation on a computer with a standard display. There would be a button for skipping, a button for replaying, a button for saving, a button for deleting, a button for hearing the time and date the message was left. In short, all of the normal voicemail actions. If the designer was ambitious, they could even have include a widget to let you scrub through the message. When you want to delete a message, you’d roam the interface with your eyes, reading each button label in a fraction of a second, find the delete button, and click it. Simple and quick. The important point is that you are effortlessly flitting your eyes past all of the information you don’t want, to find the information you do want. In fact, a very common phone system—the cell phone—has a display too, and if its voice mail interface had some hint of humanity to it, it would at least show a visual menu telling you what button performed what action.
But instead, you have to wait for the voicemail system to tell you what number to press to delete the message. Yet before it tells you how to delete the message, it will first tell you how to replay the message, skip the message, move to the previous message, save the message for later, and perhaps force you to listen to an advertisment from your service provider. And because it’s audio and linear, you can’t skip any of it. The more complex voicemail gets, the longer you’ll have to wait. With an audio interface, you have no way of moving past the information you don’t want.
The reason why the Candorville cartoon shown at the top of this post is funny is because it illustrates a lose-lose situation in voicemail: if you have instructions read to you before every message, listening to your voicemail takes forever; if you don’t have the instructions read at all, you’ll never know what to do. It’s a Catch-22. And that’s the crux of the problem: there doesn’t exist a good way of providing instructions in purely audio interfaces. Sometimes a balance can be struck, but it will always be the best of a bad set of solutions. It will never even be good.
I know that I’ve made egregious mistakes because I didn’t want to wait for the instructions and I thought I remembered what button to push. But, do I press 7 to save a message and 9 to delete it? Or is it 9 to save and 7 to delete? Naturally, I remembered incorrectly. The moral of the story is that voice-based interfaces can cost you a date.

