Soliciting opinions on voice recognition software for general computer interfaces

22 Dec 2008 - 9:42am
5 years ago
13 replies
839 reads
DrWex
2006

I'm interested in finding out if anyone has recent data or evaluations
of voice recognition software for general interaction purposes with
desktop/laptop computers (not specifically voice UIs to stand-along
apps and not voice-only UI such as telephony).

I'm interested in data on both absolute recognition success and
anything on general user satisfaction with using the software. The
obvious application here is for people who cannot use keyboard/mouse
either due to disability or because hands are occupied with other
things.

Pointers and feedback greatly appreciated. Thanks,
--Alan

Comments

25 Dec 2008 - 10:55am
DampeS8N
2008

The general consensus amount IxDs is that voice command is a terrible
control idiom. It is incomplete, lacks detail and requires extensive
verbosity to outline a clear goal.

However, engineers and people who don't think hard about how voice
command will actually work, seem to think it is the next best thing.

Even if the computer were made less literal. Even if it inferred like
a real person. There will still be a level of, "Oh, I'll just do it
myself!"

This is because that already exists with people. I hear it at work
all the time. Either one party can't articulate his thoughts, or the
other party can't understand a proper articulation.

This won't get any better with computers, it can only get worse.

Walking into a room and saying "Lights" or "Lights On" seems
romantic and sci-fi. But in reality, a well placed light switch is
better. And as much as you pretend it is lazy to speak rather than
perform an action. Speaking takes more cognitive effort.

This is why the clapper was successful. It could have been geared to,
and I believe you can activate them with, any loud noise. That
includes yelling "Lights".

However, there are situations where being able to talk to a computer
is good. Telephony is an example where good voice recognition will
make automated phone systems easier to deal with. So long as they
stop pretending to be people. Let them sound robotic. We expect it.
Hell, we WANT it. And it is a bit creepy when they don't sound
robotic.

Digital companions, and video games, are situations where being able
to understand language will provide more immersion and enjoyment.

And there are situations where talk is the best option. A robot maid
would be best controlled by voice. This is already how maids are
controlled. And "clean the bathroom, and use the productA not
productB." is all the control that is really needed.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

26 Dec 2008 - 11:28am
mark ahlenius
2008

Hey,

nice comments and I generally agree. I speak from someone who has
worked a number of years in this space (VUI design, IVR's etc).

Speech recognition can be useful, but only when applied to the right
application space and for the right reasons. Take for example Goog-411,
1-800-555-1212, or TellMe - each of these are highly useful
applications - but they are phone based. Speech rec is far from perfect
- its only probabilistic. If there is a more sure fire way of entering
the data, then by all means do it. Anothre of the problems is that the
speech recognition applications are often designed by engineers who
don't think like the end user (as described in Alan Cooper's "The
Inmates are Running the Asylum"). I know, I am guilty of that myself
till I started to see the problems.

For anyone really interested in this topic, I would recommend Bruce
Balentine's new book "It's Better to be a Good Machine than a Bad
Person" (ICMI Press). Its a fun read, but pokes fun at the over
application of ASR (Automatic Speech Recognition) which happens all the
time.

One parting comment is that I think the jury may still be out on
multimodal speech systems. Combinations of speech and GUI can be
useful, especially when the user can switch back and forth between
modalities. If its too noisy they can easily use the keyboard,
touchscreen, etc, or visa versa.

best,

'mark

William Brall wrote:
> The general consensus amount IxDs is that voice command is a terrible
> control idiom. It is incomplete, lacks detail and requires extensive
> verbosity to outline a clear goal.
>
> However, engineers and people who don't think hard about how voice
> command will actually work, seem to think it is the next best thing.
>
> Even if the computer were made less literal. Even if it inferred like
> a real person. There will still be a level of, "Oh, I'll just do it
> myself!"
>
> This is because that already exists with people. I hear it at work
> all the time. Either one party can't articulate his thoughts, or the
> other party can't understand a proper articulation.
>
> This won't get any better with computers, it can only get worse.
>
> Walking into a room and saying "Lights" or "Lights On" seems
> romantic and sci-fi. But in reality, a well placed light switch is
> better. And as much as you pretend it is lazy to speak rather than
> perform an action. Speaking takes more cognitive effort.
>
> This is why the clapper was successful. It could have been geared to,
> and I believe you can activate them with, any loud noise. That
> includes yelling "Lights".
>
> However, there are situations where being able to talk to a computer
> is good. Telephony is an example where good voice recognition will
> make automated phone systems easier to deal with. So long as they
> stop pretending to be people. Let them sound robotic. We expect it.
> Hell, we WANT it. And it is a bit creepy when they don't sound
> robotic.
>
> Digital companions, and video games, are situations where being able
> to understand language will provide more immersion and enjoyment.
>
> And there are situations where talk is the best option. A robot maid
> would be best controlled by voice. This is already how maids are
> controlled. And "clean the bathroom, and use the productA not
> productB." is all the control that is really needed.
>
>
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=36596
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>
>

26 Dec 2008 - 3:41pm
Chauncey Wilson
2007

I've been using Dragon Naturally speaking since version 1 or so as an
adjunct to document creation (mostly with Word, PPT, and email) and the last
2 versions are finally useful. I've taught several people with various
repetitive strain injuries how to use the tool and use it often myself. I'm
often surprised at what the software will now recognize out of the box, but
as others have noted, to get it really useful requires training and some
dedication, more for the commands than the actual voice input. If you put
the time in to train the system, take advantage of features like the tool
that searches your documents for particular words in your domain that are
uncommon for general speaking, and buy a really good microphone (not the
cheap one in the box) and you have a really powerful computer, you can have
quite a useful system.

Chauncey

On Mon, Dec 22, 2008 at 10:42 AM, Alan Wexelblat <awexelblat at gmail.com>wrote:

> I'm interested in finding out if anyone has recent data or evaluations
> of voice recognition software for general interaction purposes with
> desktop/laptop computers (not specifically voice UIs to stand-along
> apps and not voice-only UI such as telephony).
>
> I'm interested in data on both absolute recognition success and
> anything on general user satisfaction with using the software. The
> obvious application here is for people who cannot use keyboard/mouse
> either due to disability or because hands are occupied with other
> things.
>
> Pointers and feedback greatly appreciated. Thanks,
> --Alan
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

26 Dec 2008 - 5:16pm
Mark Young
2008

> The general consensus amount IxDs is that
> voice command is a terrible control idiom.

I have not noticed any consensus. I think most designers haven't
thought much about it or worked with it. There is probably a
consensus around "it doesn't work as well as it should". There are
a lot of technical and interface problems left to solve but the idiom
offers a lot of promise.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

27 Dec 2008 - 9:18am
Julie Strothman
2008

I work at a college for students with learning disabilities and many
of our students successfully use Dragon Naturally Speaking in
conjunction with a variety of apps. Getting fluent with the commands
and training the system are certainly hurdles that require persistent
work, and not all of the students who are interested in using the
software become fluent to the point where it's a satisfying and
efficient experience. For others, it's made the difference in their
ability. Foreign accents and speech impairments cause far more
significant obstacles. I'm interested in usability evaluations with
Dragon as well as testing for effects on writing success. Many
students self-report such improvement.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

27 Dec 2008 - 12:07pm
DampeS8N
2008

It sounds, based on both testimonies in this thread, that Dragon
Naturally Speaking is the very opposite of a well designed interface
and is very much a dancing bear.

The people using it seem to require it, and they suffer through the
steep learning curves and training time because they have to.

It is like the command line. Steep learning curve, remembering
commands, and in the end you are rewarded with new functionality you
didn't have before. And most people won't bother, even when they
know it is there and what it could do for them.

No, a 'command based' voice system will NEVER be widely used.
Because it is functionally no different than the command line, and as
is self evident, computers didn't take off until the GUI came to
abolish most hidden commands.

Sure, a few geeks will revel in their voice command lines at home. I
can even see a voice entry transcription system for linux. But the
vast majority of users won't be able to handle anything that
requires a lot of command remembering.

A audio only command based system wouldn't work.

Add in a screen, and now the primary nicety of the system is gone.
That is, the ability to not be near a screen.

Audio only voice command will have to be conversational. It won't be
able generating a document. Typing is the sensible way to generate the
written word. While dictation is a big movie cliche, the amount of
people that chose to dictate after they got their own computer is
microscopic.

Why would a computer do a better job than a person in that regard?

So the only people left using dictation software are people who
can't type for some reason. Be it injury or handicap. And so a more
appropriate solution for them is software with choices on screen that
can be chosen with the voice. Which, if I am misreading what people
are saying, could be what Dragon does. Should be at least.

But there is nothing efficient about it.

And so, back to the question that started all this, Voice command
isn't very useful for general actions. And it never will be.

At least not until the computer itself is so advanced that it can
uphold a conversation. So If I say "I'm in the mood for some
music... uuuuuummm.. How about some techno and some gangster rap
mixed up randomly. You know, the stuff I normally listen to." I get
something close to what I meant.

There isn't a voice system on the market that could parse that.
Hell, there isn't software that could handle it in text.

But all of you know I mean the same as: "Play genres: Gangster Rap
and Techno from my Most Played List."

Now, saying either of these things would be much faster than getting
up and going to a computer and loading win amp and telling it to
filter my Most Played list down to just those genres, then drag it
into a play list and then hit play and make sure random is on.

But I'd have to read a book to know I can do that with the voice
commands.

Most people would just say "Play music" and it would be all their
library on random. Which is what everyone I knew in college who
wasn't a computer major did with win amp. Occasionally they would
select a song as a starting point. And often they would skip. So what
you'd see is "Play" "Skip" and "Play - song title" being about
the only commands people would learn.

You might be able to cram channel changing "Turn on comedy central"
and lights "Lights off" in there.

Who is going to pay to have the microphones installed in each room,
and the speakers just to be able to do that?

And if I am already sitting at the computer, clicking skip doesn't
require me to put on a head set.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

27 Dec 2008 - 12:27pm
Mark Young
2008

> It is like the command line.

Have you noticed that the command line is coming back into style?
Take a look at how people use the search box and how it is evolving.
Try out Google Voice Search on a handset and imagine what more it
will be able to do in 5 years.

What do you think will happen as our PCs become more integrated with
our homes? How will you perform input when you're working in your
kitchen? Even if we have GUIs that follow us around we'll need help
for hands-free situations - voice input is the best option in that
case.

How will visually-impaired people use computers?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

27 Dec 2008 - 11:03pm
Krystal Higgins
2008

The biggest reason my company will not use voice recognition is
because it reduces confidentiality of information, passwords, NDA
items, etc. Unlike email, where you can restrict receipt to a select
list of people, sound will travel. And it's unreasonable to give us
separate offices or require us to move to a meeting room for every
message or task. Of course, for more public environments--or offices
where there is already a telephone-oriented conversation
structure--this may not be a issue.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

28 Dec 2008 - 6:33am
Chauncey Wilson
2007

The command line can be very effective and efficient for some activities.
There was a lot of research in the 1980s about how to develop an effective
and usable command line and some research showing that a well-designed
command line can be as usable as a menu or direct manipulation system for
some task.

Mark notes that command lines are coming back into style and in fact, many
of our systems are hybrid interfaces with a combination of:

Direct manipulation interfaces
Menu interfaces
Command line interfaces
Form user interfaces
Voice user interfaces\
......

Chauncey

On Sat, Dec 27, 2008 at 1:27 PM, Mark Young <mark at vizmo.com> wrote:

> > It is like the command line.
>
> Have you noticed that the command line is coming back into style?
> Take a look at how people use the search box and how it is evolving.
> Try out Google Voice Search on a handset and imagine what more it
> will be able to do in 5 years.
>
> What do you think will happen as our PCs become more integrated with
> our homes? How will you perform input when you're working in your
> kitchen? Even if we have GUIs that follow us around we'll need help
> for hands-free situations - voice input is the best option in that
> case.
>
> How will visually-impaired people use computers?
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=36596
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

28 Dec 2008 - 11:50am
mark ahlenius
2008

Right,

It's certainly not a panacea, provide it and use it where it makes
sense. Phone based systems are one example- not always necessary but
sometimes quite useful.

'mark

Sent from my iPhone

On Dec 27, 2008, at 9:03 PM, Krystal Higgins <kryshiggins at kryshiggins.com
> wrote:

> The biggest reason my company will not use voice recognition is
> because it reduces confidentiality of information, passwords, NDA
> items, etc. Unlike email, where you can restrict receipt to a select
> list of people, sound will travel. And it's unreasonable to give us
> separate offices or require us to move to a meeting room for every
> message or task. Of course, for more public environments--or offices
> where there is already a telephone-oriented conversation
> structure--this may not be a issue.
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=36596
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

29 Dec 2008 - 2:50pm
Phillip Hunter
2006

This discussion has jumped around all over the place, but as a designer working in the speech reco field for a long while, I'll throw my few cents in.

Speech recognition is a mechanism, just like the mouse, gestures, typing, etc. It has shortcomings, as all do, yet they are significantly more frustrating due to the innate comprehension most of us have for spoken language. The mouse and screen, the joystick, the touchpad, the command line all have to be learned from almost scratch. But with speech interfaces we have to, for most interactions, significantly alter something we already know in order to make use. And that's true whether we are talking about command-and-control, over-the-phone, or desktop reco.

However, knowing that changes the discussion back to what it should be. What is the right approach for the people who want to use something? Some speech reco engines are very good and will do very well supporting good interaction/interface designs. Very good speech reco engines will never overcome poor designs. Nuance/Dragon does some things very well out-of-the-box. It does other things well with a dedicated user willing to learn and practice, a la the command line. Other parts are not that successful, period. Isn't that true for almost all interaction mechanisms?

ph

29 Dec 2008 - 8:19pm
DampeS8N
2008

Perhaps. But it is folly to talk about voice based interfaces without
diving into the future, where advanced AI will enable it to be
conversational rather than command based.

An example of this might be a system administrator's tool which
communicates with the sys admins by voice.

Repetitive but not identical actions are tough to model, and sys
admins often spend a lot of their time at the command line.

So such a system could be like a junior sys admin sitting at the next
cube. Bark a few orders, get a few updates, be shown a few processes.

This would be a very good use of voice-based interaction. But it is
very far off, and could easily be supplanted before then.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

27 Dec 2008 - 9:27pm
Petra Liverani
2008

If this is helpful, Vista comprises voice recognition which seems to
be not as good as Dragon but it's free.

http://labnol.blogspot.com/2006/08/dragon-naturallyspeaking-9-vs-windows.html

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=36596

Syndicate content Get the feed