My experiences with voice-based interfaces has always been pretty
caustic. Often you have a voice command-line, where in the user can
speak commands that the computer understands (let's ignore
imperfections in recognition for now) and the computer performs this
"Play Metallica" "Skip" "Play Genre Jazz" and so on.
Every kind of voice controlled system I've dealt with has boiled
down to some kind of flat list of context-sensitive keywords. If you
say these words here, this will happen, if you say those, that will.
In the thread: "Any data on users making use of Help?" I mentioned
treating these systems like text adventures and RPGs. And I think
there is more we can learn from this that goes far beyond simple
As consoles have replaces the PC for most RPGs, and tight plots have
replaced the more freeform text driven commands of the past. I think
there are a variety of interesting techniques used in these games to
make interaction with fictional people easier. Ones we can lift.
Often time our current systems have a lack of memory. They attempt to
tell the user all that they can do right away, and every time they
interact with the system. Recently, in the game Left 4 Dead, Valve
has added a teaching mechanic to the game that alerts the player when
something new or perhaps not completely learned comes up. And it goes
beyond a list of tips that pop up a few times. The game tracks, for
example, how often the player crouches and how quickly they crouch
when they reach an obstacle they can crawl under. It also watches the
player in combat and looks to see if they crouch to let the other
players behind them have clear shots. When the game feels the user
isn't using this core mechanic enough, it lets the player know he
can do it with a tip that doesn't break the flow of the game.
We can also do this. Perhaps you have a voice activated music system.
You could watch the user and note that they never rate songs they are
listening to. Perhaps ratings are one of the primary ways you pick
the songs they are most likely to want to hear at any given moment.
One way to alert them might be to have the system say, "Remember,
say [Add to Favorites] to let me know what you like!"
The user may not have ever known they could do that, or they may have
just forgotten. But either way, now they know.
Along with this, we can track what users have recently heard, and how
much help we are giving them. We can prioritize our commands and alert
the user about the most important features first. Such as "Skip" or
Something else that was very common in old text adventures was
keeping a large list of equivalent words. In Zork, you have to face a
giant cyclops. To defeat him, you must scare him away. And to do so
requires saying a certain heroes name. Problem? He has two acceptable
names. Ulysses and Odysseus. No problem, either one works.
Yes, Yeah, Yup, Sure, Affirmative, Certainly, and so on are all
acceptable alternatives for each other. And they should all be
acceptable to your software. But this rule goes far beyond yes and
no. It should apply to anything and everything. Skip, pass, next.
Stop, halt, pause, break, hold on, wait. Everything should have as
many alternatives as possible.
Along with this is not mapping words that mean almost the same thing
to different functions. I know that "Forward" might be shorter than
"Fast Forward" but Forward is ambiguous, it could also mean
"Skip", on the other hand "Seek" is viable for "Fast Forward".
You have to be careful. If the user is going to remember this long
term, it has to make sense, and there can't be confusion about what
will do what. If this means limiting the functionality of your
What thoughts do the rest of you have about this? Clearly this is
only the tip of the iceberg. And for call-taking systems that are
likely to only be used once or twice, it isn't very helpful. What
ways might we make the user's life easier with those? Or with any
voice entry system?