
A common misconception the novice Voice User Interface (VUI) designer and developer often suffers from is the belief that designing a VUI consists of taking a Graphical User Interface (GUI) and "simplifying it" for use over the phone. After all, while only a very small minority of people can claim some talent as graphical artists, the vast majority of us can safely claim to be competent talkers - or at least competent enough to design a simple interaction between a human being and a dumb computer.
What is lost in this misconception are the following basic realities: (1) people can speak a lot faster than they can type, (2) they can listen much more slowly than they can read, and (3) they can talk much more quickly than they can listen. The conclusion is that while designing a VUI may seem, at a gut level, to be easier than designing a GUI, the opposite is in fact the case: VUI design is a lot harder than GUI design.
Time linearity
Unlike graphical interfaces, voice interfaces are linearly coupled with time. When you are reading text on a web page, for instance, you can easily skip ahead with your eyes to the section that you are interested in. This is not the case with a voice interface, where you must patiently listen to one word before you can hear the one that follows it.
Avoid long prompts: obviously, unnecessarily long prompts will quickly tax the user's patience. Long prompts explaining how the system works, for instance, may be inevitable and necessary with a novice user, but they should not be forced upon an expert user. Differentiate between novice and expert users at the outset of a call, and use short, to the point prompts with the experts.
Use short menus: the length of an alphabetically sorted drop down menu on a web page is a non-issue. The length of a menu in a voice interface on the other hand should not exceed five or six options.
Put important information first: don't annoy the caller by having them wait through unnecessary noise for the information they need. Give them what they want upfront.
Allow interruptions: the ability to interrupt is usually a must have when dealing with non-novice users. People who know what they want to do, what to say and how to say it don't want to wait for the system to finish talking before they give their response. In Site Builder, under the "Advanced Options" tab of Message and Question Pages, turn on the "Barge-in" radio button.
Offer short cuts for the user who knows what to do: another must for non-novice users are application level shortcuts that cut through menus and get the user to what they want to do or where they want to be in a menu structure. In Site Builder, you can achieve this by using Site Commands.
Allow pauses: an enormous advantage that a graphical interface has over a voice interface is the ability to easily pause and pick up where you left off. We do this without even thinking about it when we are reading a piece of text. During interactions where the caller may need to pause and do something, make sure that you offer that option to them. For instance, if the caller needs to take down a long series of numbers (say a confirmation code), ask them to go ahead and get paper and pencil and to say, "continue" when they are ready. In Site Builder, you can easily achieve this by using a message page that listens for the word "continue" and upon hearing it gives the caller the information that they need.
Uni-directionality
Compounding the linearity of speech is its unidirectional character. Just as time is a one-way street, speech is a one-way medium. When you hear something, you can't easily go back and listen to it again. Contrast this to reading a piece of text where you can readily scan a couple of paragraphs or even pages and then go back and re-read the text.
Offer to repeat: one obvious way to alleviate this limitation is to offer the ability to repeat information. Use the built-in "Repeat" functionality, available as a checkable option (defaults to checked) in the "System Commands" page in Site Builder. Of course, make sure that the user is aware that they can have information repeated to them by informing them of this ability at the beginning of the call and at any time where important information is given out to them.
Offer help: crucial information such as instructions given at the start of the interaction should be available for the user to tap into at any point in the exchange. Offer instruction on how to access help at the beginning of the call and at moments where the user seems at a loss over what to do (e.g., at no-input or a no match).
Offer summaries: in interactions where information is being gathered from the caller or given out to them in a step-wise fashion, a powerful technique to overcome the uni-directionality of voice interfaces is to offer callers the ability to ask for a summary of the information collected so far. In Site Builder, use a message page with variables embedded in the prompts and the "Go back" option for "Actions."
Invisibility
Perhaps the most frustrating thing about using a voice interface is the feeling of not knowing precisely where you are in the interaction and what exactly the system expects you to do next. A well-designed web site will show navigators where in the menu tree they are, but even if they don't, a web page usually has enough visual clues to tip the user as to where they are in the site (a url being one simple indicator). Not so with a voice interface, where the user can quickly feel lost because of a lack of mental markers positioning them precisely in the exchange with the system.
Mark the exchange: just like a well-designed web page will indicate where in the web site a user is, a good voice interface will tell the caller where in the menu tree they are positioned. Usually, a word or two will suffice: "main menu" for the highest level menu, "here are your flights" before announcing a list of flight numbers, etc.
Trace the path: in applications where the menu structure is deep and wide, callers can very easily become confused about where they are in the interaction, even when you mark the individual menu levels. In such situations, you can associate with each Voice Page that handles an interaction with callers a "position page" that traces, starting from the main menu, the position of the caller within the menu tree. "Restaurants, Chinese, Zip code," for instance would succinctly help the caller understand that they chose "Restaurants," then "Chinese," and are now being prompted for a zip code to locate nearby Chinese restaurants. You can achieve path tracing by using a Message Page with a prompt describing the path and the "Go back" option for "Actions."
Use earcons: an "earcon" or "auditory icon" is the voice-equivalent of a graphical interface's icon. An icon is a small graphic that means something specific in the context of the interaction: for instance, an "arrow" pointing to the right may mean go to the next page, and one to the left may mean go back to the previous page. Earcons can be very useful in positioning the caller within a menu structure or in announcing the type of action that is about to take place. The sound of a keyboard clicking could be used to indicate to the caller that the system is busy doing something (while dead silence may be taken by the caller that the system crashed or the call had ended).
Perhaps the one fundamental advantage that GUIs have over VUIs is the feeling of control that a graphical user has over both the medium and the interaction. It takes a very bad GUI to render the user helpless and throw them into a state of confusion. On the other hand, because a VUI is time time-linear, uni-directional, and invisible, the system has to stumble only once in the interaction for the user to be thrown in a state of hopeless perplexity. Keeping in mind that there are key differences between designing a GUI and a VUI should help the alert VUI designer avoid making the costly mistake of smuggling GUI assumptions when engaged in VUI design.
