AboutJoinDiscussionMembersLocal
 
Jeff Garbers

Most of us old-timers probably expected voice I/O to be a common part of personal computing by now. But here we are in 2008, and I don't see even early signs of voice emerging into the mainstream. Products like Naturally Speaking have some popularity, but my sense is that they're used far more for dictation than any sort of command and response interface. Both Mac OS X and Windows Vista have built-in speech recognition capability, but does anybody use them (or even know they're there)?

So my question for the group is: why? Is it due to technical shortcomings, like recognition accuracy and dealing with background noise? Are there social issues, like not wanting to be overheard or feeling silly talking to a machine?

Or is it that splicing a voice-based UI into current graphical interfaces just doesn't give a satisfactory user experience?

This, to me, is the most intriguing possibility. Voice command today reminds me of the earliest versions of mice for PCs, which generated arrow keystrokes as you moved them around; although they were ostensibly compatible with the existing applications, they just didn't work well enough to justify using them. Could it be that an effective voice-based UI requires a more basic integration into the OS and applications? Perhaps we need an OS-defined structure for a spoken command syntax and vocabulary rather than just expecting users to speak menu items?

Why aren't we talking to our computers yet? Should we be?

Christine Boese

David Pogue at NYTimes has been out front about how in love with voice systems he is. You can search his past columns.

Chris

On Tue, May 13, 2008 at 2:36 PM, Jeff Garbers jgarbers at xltsoftware.com wrote:

Most of us old-timers probably expected voice I/O to be a common part of personal computing by now. But here we are in 2008, and I don't see even early signs of voice emerging into the mainstream. Products like Naturally Speaking have some popularity, but my sense is that they're used far more for dictation than any sort of command and response interface. Both Mac OS X and Windows Vista have built-in speech recognition capability, but does anybody use them (or even know they're there)? So my question for the group is: why? Is it due to technical [trim]

Jeffrey D. Gimzek

Imagine 4 people in a small office all talking to their computers every 2 seconds to say "new window....scroll down... stop...up... select file...."

I think it is mostly social, although everyone i know that has tried voice command has given it up, even when trying home alone in the quiet house, so the tech isnt there either.

plus, talking is WAY slower than your hands.

On May 13, 2008, at 11:36 AM, Jeff Garbers wrote:

Most of us old-timers probably expected voice I/O to be a common part of personal computing by now. But here we are in 2008, and I don't see even early signs of voice emerging into the mainstream. Products like Naturally Speaking have some popularity, but my sense is that they're used far more for dictation than any sort of command and response interface. Both Mac OS X and Windows Vista have built- in speech recognition capability, but does anybody use them (or even [trim]

David Malouf

I think I would only be happy with one if it worked as well and as kookie as those in Iron Man. There are 2 clear examples of this:

1) Jarvis the incredible AI. Very very natural speech in both directions.

2) But even his robotic arms responded to incredibly natural and often colloquial speech as well.

The issue is mode changing. Going into that un-natural mode is very disconcerting.

I also think you have lot of good points as well. But I really think the technology isn't there yet. I recently demoed a new Ford Sync system (co-done w/ Microsoft) and while it was novel, with good surprises, I think as a total UX it was quite, well sub-par.

In the end I don't think people "trust" these systems enough b/c the ones we are forced through have such a negative experience (even if they are pretty darn functional). Meaning that the total experience design is flawed, so even if the technical side works correctly, our total experience emotionally is tied to a very negative response.

- dave

Kristopher Kinlen

I am currently dealing with the same questions / problems. I work in the clinical space where the user's hands are often gloved up and covered in fluids. Interacting with software via a touchscreen or hardware device presents sterility issues so voice is the natural solution. As simple an answer as that seems, to date, few people in the industry actually use the voice solutions that are available.

It seems to be creeping in... sync in cars is becoming more common as well as the touch tone menus on the other end of many 1-800 numbers being replaced by voice.

I had the same sort of thoughts...

Gretchen Anderson

plus, talking is WAY slower than your hands.

You bet. At least for some things.

We just did a related project and looked at voice, and one thing that came up is that StarTrek really set an expectation that's hard to deliver on. The whole "computer: [insert your open ended, humanly voiced question/command here]" thing isn't quite prime time. Plus, people have a hard time remembering the voice commands where a GUI can give you prompts.

Another reason is that many platforms make invoking voice command hard. You often have to go somewhere/do something special and then start talking. I subscribed to Jott, thinking it would be my new fave way to set reminders for myself. But in reality I don't remember to make a special phone call to set a reminder, I go to my calendar on my phone. "Input where you output!"

Tim Ostler

1. Above all it is social. Working amongst fellow workers all talking to their computers would be like working in a call centre - only without the scope for eavesdropping on something interesting

2. It creates more cognitive load for both human and computer:

  • for the human, to verbalise what you want something on screen to do and then say it, then confirm that it has worked;
  • for the computer, to interpret the sound it detects and convert that into interface instructions
  • I am not surprised that voice recognition is more widely used for dictation than for commands, as that is a situation where it can offer real productivity benefits. Even here, some people just prefer to express themselves with a keyboard; personally I never got used to using a dictaphone or dictating to a secretary (remember them?) .

    Tim Ostler
    London

    Christine Boese

    It just struck me, I wonder how much of the resistance to it is probably because of "Open the pod bay door, HAL."

    Another freaky thing hit me the other day, very disconcerting. I listen to public radio constantly at home, every morning. I imagine public radio has many reasons to want to cut costs, but unlike NOAA (the automated weather repeater you get on your weather radio as you drive through thunderstorms and tornadoes cross-country... sometimes I just listen so I can feel like Stephen Hawking is riding with me in the car... if I could just get it to talk about string theory or something fun), public radio would have REALISTIC sounding automated voice announcers, wouldn't they?

    I really don't think NPR is running segues and other bits from automated voice generators, but the trick of my ear is that I sometimes HEAR it that way. Maybe it is in the nature of the digital signal, I don't know, but either the fake voices being created now are being modeled on the inflections of NPR announcers (segue announcers, not story readers, who are clearly real people), or something about the transmission of those announcer voices is making them sound synthesized.

    I definitely have a few HAL moments while listening some mornings, that's for sure. Except it is usually that woman's synthesized voice, more like the 411 numbers. Calm and NPR-sounding women. I'm sure they test out great for delivering info in a style to keep us calm while we are being kept on hold.

    Chris

    On Tue, May 13, 2008 at 3:43 PM, Jeffrey D. Gimzek listserv at jdgimzek.com wrote:

    Imagine 4 people in a small office all talking to their computers every 2 seconds to say "new window....scroll down... stop...up... select file...." I think it is mostly social, although everyone i know that has tried voice command has given it up, even when trying home alone in the quiet house, so the tech isnt there either. plus, talking is WAY slower than your hands. On May 13, 2008, at 11:36 AM, Jeff Garbers wrote: Most of us old-timers probably expected voice I/O to be a common part of personal computing by now. But here we [trim]

    Scott McDaniel

    I think it'd be fair to say that voice controls would largely need to be an enhancement to screen/key/mouse
    driven input for all the reasons mentioned before. I fear, too, that many of the approaches to voice UI is following
    the past 20 years of visual UI Design, based on products out there instead of starting from the ground up of
    "What would someone want a voice UI to do?"

    At least if voice command phone systems and the navigation system on my Prius are any indication, anyway : )

    Scott

    -- 'Life' plus 'significance' = magic. ~ Grant Morrison

    Peyush Agarwal

    I think the problem w/ voice-based UIs are/would be:

    1. Technical - dealing w/ accents, sound levels, ambient noise etc.

    2. The computer would need to understand what we 'mean' as opposed to visual UI where we click what the computer has to offer

    3. Humans work better by recognition rather than recall. Visual UI's aid recognition, while voice UI basically requires good recall. You'd have to remember the exact command that'd generate desirable response or else you're back to # 2.

    4. This is one of the biggest drawbacks of voice based interaction with a computer - it is essentially serial, as opposed to visual UI which is parallel. This is one of the reasons why I think the iPhone's visual vmail was such a hit. In this respect, the computer would really need to get to the level of a human-human interaction - just "knowing" when to interrupt and when to get interrupted in order to carry a serial interaction with almost parallel efficiency.

    5. Probably I'm just used to the keyboard/mouse, but I think talking to the computer would be tiring, unless of course you're doing StarTrek - volume, tone, tenor, clarity, noise no bar - and maybe it'll be workable enough...

    -Peyush

    Will Parker

    On May 13, 2008, at 12:55 PM, Kristopher Kinlen wrote:

    I am currently dealing with the same questions / problems. I work in the clinical space where the user's hands are often gloved up and covered in fluids. Interacting with software via a touchscreen or hardware device presents sterility issues so voice is the natural solution. As simple an answer as that seems, to date, few people in the industry actually use the voice solutions that are available.

    I can think of several reasons why voice commands in a surgical environment would be problematical.

    The best reported reliability I've seen for a simple voice command system was around 98%, and frankly, I didn't believe that number when I saw it. Most trials involving voice to text systems report about 95% reliability, and those usually involved a period of training the software to recognize individual users' utterances.

    Is ~95% reliability sufficient in the operating room? That depends, I suppose, on which functions could not be performed more reliably by the operating room staff without adding to the overall cognitive load for any one of the staff.

    And if we're talking about introducing introducing slightly-unreliable functionality into a risk-sensitive, cognitive-load-sensitive process, I have to ask what actual improvements in surgical practice (other than reducing the cost of staffing an operating room) would come from voice command systems?

    -Will

    Will Parker
    wparker at channelingdesign.com

    Loredana Crisan

    This is an interesting topic.
    I'm currently working on a Voice UI for a consumer product application.

    It seems to me that while voice I/O promises to deliver an enhanced experience, the technology does not and cannot yet live up to its promise.
    Aside from the social awkwardness of talking to your computer in an office full of people, here's what makes matters even worse:

    1. Recognizers usually tend to miss-recognize short words that would feel intuitive to the user, such as "back" and "next" and "stop" What you are left with as the a designer is "Go back, Play next, Stop now" - words that consumers would never think to say, and frankly irritate them.

    2. Let's assume though that they do make the effort to learn the keywords, and are alone (or ignore the folks at the office). They open their mouth wide and say "Plaaay Neeeext." only to be faced with their worst fear: "I'm sorry, I couldn't understand that."

    Us humans rely heavily on being able to communicate. Our survival as a species depends on it, and our success is a direct result of the ability we have to understand each other.
    We are hard-wired to be really upset when we cannot make ourselves understood. At the gut-level, miscommunication is a threat.

    The application I'm working on gives users the option to interact either via keypad input or voice input. Only about 30% choose voice. It's convenient when they're driving, when they absolutely need to focus their eyes on something else.

    But in truth, with the current technology, there seem to be circumstances in which the advantage of using voice to communicate with a machine is greater than its drawbacks.

    Loredana

    Will Parker

    On May 13, 2008, at 1:25 PM, Peyush Agarwal wrote:

    3. Humans work better by recognition rather than recall. Visual UI's aid recognition, while voice UI basically requires good recall. You'd have to remember the exact command that'd generate desirable response or else you're back to # 2. 4. This is one of the biggest drawbacks of voice based interaction with a computer - it is essentially serial, as opposed to visual UI which is parallel. This is one of the reasons why I think the iPhone's visual vmail was such a hit. In this respect, the computer would really need to get to the level of a [trim]

    "Almost parallel efficiency" is indeed the key victory condition for voice UI.

    Even at 99.99% voice recognition reliability (plus the absurd 100% natural language parsing reliability we see in the movies), every command interaction that involves a non-trivial, unrecoverable change in state is going to require a confirmation phase: "I think you said 'Go Left'. Is that correct?"

    One-way auditory signals are a great thing, even under high-stress conditions. Two-way auditory communication requires a mix of trust and half-duplex hand-shake negotiation, and that last bit is the deal- breaker for unreliable computer voice recognition.

    -Will

    Will Parker
    wparker at channelingdesign.com

    Victoria Stanbach

    I used to work for a start-up called AgileTV. We developed a very robust speech to TV control interface. The company is now called Promptu. Check them out: www.promptu.com

  • Speech recognition is very advanced today. You can have anyone speak a number of specific words into a microphone and the computer adapts to your speech. Promptu's technology of speech inputserverresponse is very fast.
  • In many user tests the system was found to be very interesting and useful to some - mainly elderly and disabled, but we ran a regional test with a local cable company, typical users found that it was just as complicated to learn the new speech interface as it was to navigate the on screen guides.
  • David Malouf

    Where I work @ Motorola Enterprise Mobility our partners create a lot of voice activation systems. The main application is in item picking (think a warehouse setting) or other finite tasking system.

    These systems

    1. learn @ the individual level

    2. usually include ear piece & boom mic.

    3. have short & broad menuing systems that are filtered by role & individual.

    4. the worker & the system both go through training

    This is fairly successful, but also pretty far from mainstream.

    - dave

    Jeffrey D. Gimzek

    On May 13, 2008, at 1:36 PM, Will Parker wrote:

    On May 13, 2008, at 12:55 PM, Kristopher Kinlen wrote: I am currently dealing with the same questions / problems. I work in the clinical space where the user's hands are often gloved up and covered in fluids. Interacting with software via a touchscreen or hardware device presents sterility issues so voice is the natural solution. As simple an answer as that seems, to date, few people in the industry actually use the voice solutions that are available. I can think of several reasons why voice commands in a surgical environment would be problematical. The best reported reliability [trim]

    you can push the malpractice suits over to the computer company ?

    - -

    Jeffrey D. Gimzek | Senior User Experience Designer

    http://www.glassdoor.com

    Brandon E.B. Ward

    A friend of mine had voice-recognition software that locked his computer. To unlock the machine he just talked into the mic. (think Sneakers "My voice is my passport." )

    But if people in the office were being noisy, or the air-conditioner kicked on, or someone was walking by, or he had a cold - he couldn't get into the system because the audio the machine was receiving differed too much from what it initially recorded when he set up his password.

    B

    keyur sorathia

    Hi all,

    I work with IIT (Indian Institute of Technology) Mumbai, India, as an interaction designer. Currently we are doing a research project called "galla - a low cost retail management system". We are designing a hardware and a software for small grocery shop keepers for better customer management, item management and vendor management.

    In one of our exploration, we tried context based speech recognition system, which works pretty well. We designed a UI particularly for this application. While making a bill of particular items in a grocery shop, these context based words helps making bills faster. In India, as grocery shops are noisy, this system is facing some problems about accuracy because of background noise. This system is designed in a way where there is no need to train it, one can directly start operating it. Currently the system works only for english words, we are also trying out with regional languages. But for sure, as it is a context based voice recognition system, it works much better that normal speech recognition system.

    We are still trying to find a good solution for reducing the background noise and making this system more effective.

    Cheers!!!

    On Wed, May 14, 2008 at 3:38 AM, Victoria Stanbach vic at victoriastanbach.com wrote:

    I used to work for a start-up called AgileTV. We developed a very robust speech to TV control interface. The company is now called Promptu. Check them out: www.promptu.com - Speech recognition is very advanced today. You can have anyone speak a number of specific words into a microphone and the computer adapts to your speech. Promptu's technology of speech inputserverresponse is very fast. - In many user tests the system was found to be very interesting and useful to some - mainly elderly and disabled, but we ran a regional test with a local cable company, [trim]

    -- Keyur Sorathia
    Interaction Designer,
    Media Lab Asia,
    IIT Mumbai.
    mobile : +91 98198 15448

    email : keyurbsorathia at gmail.com

    Anders Ljung

    Jeff, I think all of the reasons you mentioned applies. Speech recognition and synthesis in my opinion adds very little in the keyboard/mouse/screen paradigm we are currently in.

    Studying this also reveals how much information there is in the way we say things, human to human, which is very tricky for computers to analyze. A "Hmm" can mean so many things depending on timing, intonation etc. Gabriel Skantze at KTH did a pretty nice system for "pedestrian navigation" which tries to overcome this.

    http://www.speech.kth.se/~gabriel/software.html

    Kris Kinlen

    The focus is more on the "cover my ass" side of things rather than actually using software to help perform procedures. There are all kinds of information that has to be documented and charted during/after a procedure is performed and many doctors are looking to improve productivity and profitability so they look to software.

    You are right on with the reliability though... I personally am not comfortable with a 95% reliable surgeon hehe.

    Regards,
    Kristopher Kinlen
    x63331

    Original Message
    From: Will Parker [mailto:wparker at channelingdesign.com] Sent: Tuesday, May 13, 2008 4:37 PM
    To: Kris Kinlen
    Cc: discuss at ixda.org
    Subject: Re: [IxDA Discuss] Why isn't voice-based UI mainstream?

    And if we're talking about introducing introducing slightly-unreliable functionality into a risk-sensitive, cognitive-load-sensitive process, I have to ask what actual improvements in surgical practice (other than reducing the cost of staffing an operating room) would come from voice command systems?

    -Will

    Will Parker
    wparker at channelingdesign.com

    Will Parker

    On May 14, 2008, at 7:28 AM, Kris Kinlen wrote:

    The focus is more on the "cover my ass" side of things rather than actually using software to help perform procedures. There are all kinds of information that has to be documented and charted during/after a procedure is performed and many doctors are looking to improve productivity and profitability so they look to software.

    Can you give an example or two of the type of procedure documentation required?

    I'm wondering why a non-interactive audiovisual record wouldn't fill the bill. (Like the no-doubt-fascinating-to-surgical-interns knee reconstruction videos that keep popping up on the University of Washington cable channel).

    Why impose the additional workload of managing the data collection system on the most critical personnel in the process?

    It's quite cheap (financially and technically) to add massive-but-dumb data collection functionality to an already-wired venue like the modern surgical theater. Grab the entire event as fine-grained raw data and emulate Google to find the interesting bits. Or let your pet intern do that for you. (Oh ... wait ... that last bit isn't monetizable. Forget I said that.)

    -Will

    Will Parker
    wparker at channelingdesign.com

    Kevin Doyle

    Jeffery is right — the workplace is what's keeping voice UI from becoming commonplace. It's where most computers are used — imagine how noisy a cube farm of just 20 people talking to their computer would get.

    I've read about some great HCI coming to the home — you'll be able to start your dishwasher, check what's in your fridge while at the grocery store (or order from home using your fridge) and turn on the AC/heat very soon. I could see how the inside of a home could be controlled by voice once things get that wired... but until then, I don't see much voice happening.

    Scott Berkun

    I'm sure I'll be forever labeled as the curmmodgeonly luddite on the list, but I really do not want to ever debug or reboot my refridgerator, even if that means I'll always have to make shopping lists the old fashioned way: the upside of automation is totally outweighed for me by the likelyhood of adding more fragility. Frankly in 2008 its still pretty damn hard to find a thermostat that doesn't toally suck to use - my faith in the ease of use of web programmable kitchen appliances is comically low.

    More in-line with this thread: why do we assume homes have less background noise than offices? If the TV or radio is on doesn't that create nearly as many problems?

    -Scott

    Scott Berkun
    www.scottberkun.com

    From: "Kevin Doyle" kbdoyle at gmail.com I've read about some great HCI coming to the home — you'll be able to start your dishwasher, check what's in your fridge while at the grocery store (or order from home using your fridge) and turn on the AC/heat very soon. I could see how the inside of a home could be controlled by voice once things get that wired... but until then, I don't see much voice happening.

    Loredana Crisan

    I was reading an interesting book - "It's better to be good machine than a bad person" in which it was described how controlling your home appliances by voice can prove to be ... um, challenging. Background noise is an issue - let's say you're watching a movie and the main character shouts "turn that off!" At the same time your dishwasher stops. Grr!
    High error rates in this type of applications are common.

    But I believe that the main problem with voice is still social/ psychological. How do you talk to a machine?
    I've looked at a bunch of Sync videos on Youtube - people are obviously feeling uneasy talking to their car. I'd love to read about the psychology of IVR...

    How do you folks feel when you have to use an interactive voice response system?

    On May 14, 2008, at 12:27 PM, Scott Berkun wrote:

    I'm sure I'll be forever labeled as the curmmodgeonly luddite on the list, but I really do not want to ever debug or reboot my refridgerator, even if that means I'll always have to make shopping lists the old fashioned way: the upside of automation is totally outweighed for me by the likelyhood of adding more fragility. Frankly in 2008 its still pretty damn hard to find a thermostat that doesn't toally suck to use - my faith in the ease of use of web programmable kitchen appliances is comically low. More in-line with this thread: why do we [trim]

    Jeff Garbers

    On May 14, 2008, at 3:34 PM, Loredana Crisan wrote: How do you folks feel when you have to use an interactive voice response system?

    Anxious, because I don't trust them; given the choice to "press or say your account number" I always use the keypad, figuring DTMF is a lot less ambiguous than English.

    Irritated, because they often ask questions as if they understand natural language, but they don't. "What can I help you with today?" is a pretty generic prompt, and I have very low confidence that anything good will happen if I go into my 30-second description of why my check got credited to the wrong account, etc. etc. Best thing that can happen is to have it say "Okay, let me get you a representative to help you with that problem."

    Maybe we have the same sort of "uncanny valley" phenomenon with IVR as we do with CG human characters in movies... perhaps it's better not to try to simulate human behavior, since you lead people to focus on the differences and not the similarities.

    Brandon E.B. Ward

    Anxious, because I don't trust them;

    I remember my mom telling me about the first electronic calculators they had way back in the day. They'd just switched from mechanical adding machines to electronic calculators. She said they were great - small, light, fast, but they couldn't be trusted 100% of the time. Sometimes the answer it gave was wrong. So after initially doing everything quickly with the calculator (can't remember what - some data entry/books/accounting type stuff) they'd do it all over again in their head or by hand or using the old mechanical system to verify the answer. It didn't take long before they didn't have to do this anymore, but she recalled being untrusting of the new fandangled technology.

    I'm guessing that Voice-Rec. has a similar hurdle to jump - but it probably will someday.

    B

    Jackie O'Hare

    On May 14, 2008, at 3:34 PM, Loredana Crisan wrote: How do you folks feel when you have to use an interactive voice response system?

    "Anxious, because I don't trust them; given the choice to "press or say your account number" I always use the keypad, figuring DTMF is a lot less ambiguous than English.

    Irritated, because they often ask questions as if they understand natural language, but they don't. "What can I help you with today?"..."

    ...

    I find interactive voice response difficult at best, but frequently infuriating. As many people have been indicating, error rates are high, and what you intuitively think you need to say to get the response you need is not necessarily the command that the response system requires in order to get that action. My own experiences with interactive voice response have generally ended with me trying to usurp the system by pressing the * key repeatedly, which does usually boot you out of the system and land you on the phone with a real live human.

    Unfortunately, when someone is already mad about something it's not a really great time to engage them in a challenging user environment.

    This doesn't really apply to more neutral situations - but how long does it take someone to become infuriated if the voice command to execute simple tasks repeatedly malfunctions?

    Jeff Seager

    I work with a number of people with disabilities who actually use Dragon Dictate and Dragon Naturally Speaking, and some of you who have some experience with this will understand that the technology is imperfect at best.

    If you've never seen this in action, you should know that the software must be trained by each user. It's a rather painstaking process, but it's worth it for people who have physical impairments that limit their options.

    Even after customizing it for your voice, dictating letters with this software is an exercise in extreme patience. I resist entering one co-worker's office when she's drafting a letter or e-mail or some other document, because she has to say "go to sleep" before we can carry on a discussion. And even though the software has "learned" her accent and vocal inflections, she's constantly having to back it up to correct the spelling of an uncommon word or name. I would call her an expert user, and she still says, "Dragon Dictate sucks!"

    Having seen quite a few people struggle with this, I think current voice recognition software is sufficient for very discreet purposes where you have a limited command set. A cell phone or address book will have a finite number of stored contact names, for example. I don't know how this would work for a TV or car stereo, as it seems to me the sound of the device itself would interfere with the command reception.

    For people who need it to get any work done on a computer, it's worth the hassle. But probably not for anyone else. I believe that anyone who can improve this will reap some very good karma.

    Joe Pemberton

    Great discussion...

    We did some UI (voice and graphical) design for Promptu, a voice search company focused on set top TV and mobile device voice search. The key learnings were mostly around the awkwardness of dealing with a hybrid UI — one where you're providing voice input and receiving visual output. In a graphical UI we might be taking for granted the cues we're giving to users all the time — hover states, loading indicators, etc. You have to compensate harder in voice UI I think.

    This hybrid approach is actually an asset, but just requires new thinking. The alternative (voice input, aural output) is more like what we experience with telephone banking. Ugh.

    The other learning was that people dont know what to say and the consequences for errors are high. Users were afraid to make mistakes because returning search results and dealing with mistakes was time consuming and burdensome.

    Promptu actually has some great "did you mean" functionality akin to Google's and just as intuitive. Further, because the search was within categories (e.g. movie titles or album names) the accuracy was excellent. When input is open ended, as with dictation, it's dismal by comparison.

    Lastly, deslite mobile handsets being used with voice all the time, people still have an awkwardness talking into a visual UI. Users didn't know how to hold it and would talk at the screen not the microphone— even though they use their mobile for voice more than they do for mobile data.

    I think we'll get there with voice making incremental inroads into UI where it makes sense — cars for one and where the scope of spoken input is well defined as with categorized search.

    Pankaj Chawla

    I think the real reason is, how will you want to interact with the system. For eg if you have to check the balance in your account maintainied in a spreadsheet what will be your sequence of voice commands:

    1. Start

    2. Programs

    3. Windows Explorer

    4. My Documents

    5. Open Account.xls

    6. Select Column D

    7. Sum to Cell D6

    8. Speak D6

    9. Close 10. Exit

    or

    1. Hey computer can you tell me the balance in my account.

    If its the first case we are about 40% on our way to reaching there but if its the second scenario we havent even started yet. To me its a question of mental model vs implementation model with an undefined designer model as of now. I am not sure if we will be able to reach the current mental model anytime soon so as designers its imperative that a designer model is first brought forward that can bridge the gap between the mental and implementation model within the limitations of currently available technology and business needs.

    My 2 cents.

    Cheers
    Pankaj
    http://13degree.wordpress.com
    Do your dreams!

    Michael Micheletti

    On Tue, May 13, 2008 at 11:36 AM, Jeff Garbers jgarbers at xltsoftware.com wrote:

    Why aren't we talking to our computers yet? Should we be?

    Or cars. I've been thinking about BMW's iDrive while following this thread. This is a screen-based control system that also has a voice control interface. I remember reading comments elsewhere from a BMW salesperson who said that he sat new owners down in their cars and helped them train the air conditioner. That way they could issue commands vocally without taking their eyes off the road.

    The idea of using voice recognition as a navigation layer superimposed on other controls is interesting, but I'll admit I'm glad my older 3-series has pushbuttons.

    Michael Micheletti

    Victor Lombardi

    On Thu, May 15, 2008 at 11:42 AM, Michael Micheletti michael.micheletti at gmail.com wrote:
    On Tue, May 13, 2008 at 11:36 AM, Jeff Garbers jgarbers at xltsoftware.com wrote: Why aren't we talking to our computers yet? Should we be?

    Apple includes basic speech recognition on Macs:
    http://www.apple.com/accessibility/physical/

    Now that I'm a parent, talking with other parents, I found a common use case for speech recognition: many hours of feeding and soothing a baby requiring both hands. The baby is happy and occupied, but caregivers would like to do something too. I've talked to parents who started reading coil-bound books just because they lay flat without constant handing. So a hands-free software UI for caregivers could be a big hit, especially if it was optimized for reading email, web browsing, and writing rather than general purpose control.

    Sachendra Yadav

    I think Voice UI will not become primary mode of interaction in the near future for obvious reasons. It'll be used mostly when Visual UI is difficult to use i.e while driving a car, taking care of babies, performing surgical procedure etc

    Sachendra Yadav
    http://sachendra.wordpress.com

    Related Threads
    Tags

    Back to Top